Skip to content

transformer_engine 安装失败 #4051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zhangtianhong-1998 opened this issue Apr 30, 2025 · 7 comments
Open

transformer_engine 安装失败 #4051

zhangtianhong-1998 opened this issue Apr 30, 2025 · 7 comments

Comments

@zhangtianhong-1998
Copy link

cuda 12.4
python 3.10
torch 2.6.0
参考了两个关闭issues仍未解决
pip install 'ms-swift'
pip install pybind11

注1 使用了gitee复制了仓库绕过网络限制

SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])") && echo $SITE_PACKAGES &&
CUDNN_PATH=$SITE_PACKAGES/nvidia/cudnn CPLUS_INCLUDE_PATH=$SITE_PACKAGES/nvidia/cudnn/include
pip install git+https://gitee.com/zhangtianhonggitee/TransformerEngine.git@stable

注2 pip install 'ms-swift[all]' -U

和源码完整版本pip install -e '.[all]'
会出现解包错误,所以只安装了pip install 'ms-swift'

Collecting binpacking (from ms-swift[all])
Using cached binpacking-1.5.2-py3-none-any.whl
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/fa/1c/d85aa7b120c09615c6d0f791fe581d42eb1fb062478fdc25a4e95dc88113/binpacking-1.5.1.tar.gz (9.4 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/83/08/5fb79fafc4c857d6712a24250b1fdba6aa3821b9492ccc239a05bf6ccfbf/binpacking-1.5.0.tar.gz (9.4 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/e8/74/fd61be713a1bfe72a7394bc4fe9cb5fc70d0aaf4a4b49a2e8152eed67a59/binpacking-1.4.5.tar.gz (8.9 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/53/b3/2796bc69236c624e46ba02b4e11c3c8d66193ce2124a03c11db190176bfe/binpacking-1.4.3.tar.gz (7.6 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/e6/de/5e565925472c7f9a987525cb6b49ac32a228fe203cd76c207d041683d40c/binpacking-1.4.2.tar.gz (7.6 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/dc/97/7e632f6dcd46c806160211d1e9a5cda1641cbb1a74fb5967024c5aa52ed5/binpacking-1.4.1.tar.gz (7.6 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/7a/9a/c336fe2f0546f17d945e6f9f6bc06b8b306d10750b20ec6e12715c32f7f8/binpacking-1.4.tar.gz (5.8 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/c9/fe/56782753922a195d332d419949f889c1d59cab7b1780db2351bd8b99501c/binpacking-1.3.tar.gz (5.6 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/9b/e4/a7ee63c0f201c5edb5817e36f964c571112fc00b23e8887bee4b41ac97f4/binpacking-1.2.tar.gz (5.4 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/51/d6/a26db6fd38fba493c3bfbd51e91b14a985bcc08dcf2900a9fd850f3b8507/binpacking-1.1.tar.gz (5.4 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/d0/eb/7a7e6f4be7376260e97879cf51f1e3b9ff614f31e97355b3e26a587a2535/binpacking-1.0.tar.gz (5.1 kB)
Preparing metadata (setup.py) ... done
Collecting attrdict (from ms-swift[all])
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/ef/97/28fe7e68bc7adfce67d4339756e85e9fcf3c6fd7f0c0781695352b70472c/attrdict-2.0.1-py2.py3-none-any.whl (9.9 kB)
error: resolution-too-deep

× Dependency resolution exceeded maximum depth
╰─> Pip cannot resolve the current dependencies as the dependency graph is too complex for pip to solve efficiently.

hint: Try adding lower bounds to constrain your dependencies, for example: 'package>=2.0.0' instead of just 'package'.

stable版本

使用指令

SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])") && echo $SITE_PACKAGES &&
CUDNN_PATH=$SITE_PACKAGES/nvidia/cudnn CPLUS_INCLUDE_PATH=$SITE_PACKAGES/nvidia/cudnn/include
pip install git+https://gitee.com/zhangtianhonggitee/TransformerEngine.git@stable

              instantiation of "void transformer_engine::gated_kernels::quantize_gated<IS_DGATED,ParamOP,ActOP,DActOP>(const transformer_engine::Tensor &, const transformer_engine::Tensor &, transformer_engine::Tensor *, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::relu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 1073
              instantiation of "void transformer_engine::detail::quantize_gated_helper<IS_DGATED,ParamOP,ActOP,DActOP>(NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::relu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 59 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::gated_act_fn<ComputeType,Param,ActOP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, ActOP=transformer_engine::relu]" at line 26 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/relu.cu

  [41/43] /usr/local/cuda-12.4/bin/nvcc -forward-unknown-to-host-compiler -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/.. -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/include -I/usr/local/cuda-12.4/targets/x86_64-linux/include -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-arnq_5jt/build/cmake/string_headers -isystem=/usr/local/cuda-12.4/include -Wl,--version-script=/tmp/pip-req-build-arnq_5jt/transformer_engine/common/libtransformer_engine.version --expt-relaxed-constexpr -O3 --threads 1 -O3 -DNDEBUG --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcompiler=-fPIC -std=c++17 -MD -MT CMakeFiles/transformer_engine.dir/activation/gelu.cu.o -MF CMakeFiles/transformer_engine.dir/activation/gelu.cu.o.d -x cu -c /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu -o CMakeFiles/transformer_engine.dir/activation/gelu.cu.o
  /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_kernels.cuh(930): warning #177-D: variable "input_shape" was declared but never referenced
      const auto &input_shape = input.data.shape;
                  ^
            detected during:
              instantiation of "void transformer_engine::fp8_quantize_arch_ge_100<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1209
              instantiation of "void transformer_engine::fp8_quantize<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1255
              instantiation of "void transformer_engine::detail::quantize_helper<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 36 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::act_fn<ComputeType,Param,OP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 13 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu

  Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

  /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_gated_kernels.cuh(829): warning #177-D: variable "amax_ptr" was declared but never referenced
      float *const amax_ptr = reinterpret_cast<float *>(output->amax.dptr);
                   ^
            detected during:
              instantiation of "void transformer_engine::gated_kernels::quantize_gated<IS_DGATED,ParamOP,ActOP,DActOP>(const transformer_engine::Tensor &, const transformer_engine::Tensor &, transformer_engine::Tensor *, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 1073
              instantiation of "void transformer_engine::detail::quantize_gated_helper<IS_DGATED,ParamOP,ActOP,DActOP>(NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 59 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::gated_act_fn<ComputeType,Param,ActOP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, ActOP=transformer_engine::gelu]" at line 26 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu

  /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_kernels.cuh(930): warning #177-D: variable "input_shape" was declared but never referenced
      const auto &input_shape = input.data.shape;
                  ^
            detected during:
              instantiation of "void transformer_engine::fp8_quantize_arch_ge_100<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1209
              instantiation of "void transformer_engine::fp8_quantize<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1255
              instantiation of "void transformer_engine::detail::quantize_helper<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 36 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::act_fn<ComputeType,Param,OP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 13 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu

  Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

  /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_gated_kernels.cuh(829): warning #177-D: variable "amax_ptr" was declared but never referenced
      float *const amax_ptr = reinterpret_cast<float *>(output->amax.dptr);
                   ^
            detected during:
              instantiation of "void transformer_engine::gated_kernels::quantize_gated<IS_DGATED,ParamOP,ActOP,DActOP>(const transformer_engine::Tensor &, const transformer_engine::Tensor &, transformer_engine::Tensor *, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 1073
              instantiation of "void transformer_engine::detail::quantize_gated_helper<IS_DGATED,ParamOP,ActOP,DActOP>(NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 59 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::gated_act_fn<ComputeType,Param,ActOP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, ActOP=transformer_engine::gelu]" at line 26 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu

  /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_kernels.cuh(930): warning #177-D: variable "input_shape" was declared but never referenced
      const auto &input_shape = input.data.shape;
                  ^
            detected during:
              instantiation of "void transformer_engine::fp8_quantize_arch_ge_100<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1209
              instantiation of "void transformer_engine::fp8_quantize<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1255
              instantiation of "void transformer_engine::detail::quantize_helper<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 36 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::act_fn<ComputeType,Param,OP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 13 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu

  Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

  /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_gated_kernels.cuh(829): warning #177-D: variable "amax_ptr" was declared but never referenced
      float *const amax_ptr = reinterpret_cast<float *>(output->amax.dptr);
                   ^
            detected during:
              instantiation of "void transformer_engine::gated_kernels::quantize_gated<IS_DGATED,ParamOP,ActOP,DActOP>(const transformer_engine::Tensor &, const transformer_engine::Tensor &, transformer_engine::Tensor *, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 1073
              instantiation of "void transformer_engine::detail::quantize_gated_helper<IS_DGATED,ParamOP,ActOP,DActOP>(NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 59 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::gated_act_fn<ComputeType,Param,ActOP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, ActOP=transformer_engine::gelu]" at line 26 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu

  /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_kernels.cuh(930): warning #177-D: variable "input_shape" was declared but never referenced
      const auto &input_shape = input.data.shape;
                  ^
            detected during:
              instantiation of "void transformer_engine::fp8_quantize_arch_ge_100<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1209
              instantiation of "void transformer_engine::fp8_quantize<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1255
              instantiation of "void transformer_engine::detail::quantize_helper<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 36 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::act_fn<ComputeType,Param,OP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 13 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu

  Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

  /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_gated_kernels.cuh(829): warning #177-D: variable "amax_ptr" was declared but never referenced
      float *const amax_ptr = reinterpret_cast<float *>(output->amax.dptr);
                   ^
            detected during:
              instantiation of "void transformer_engine::gated_kernels::quantize_gated<IS_DGATED,ParamOP,ActOP,DActOP>(const transformer_engine::Tensor &, const transformer_engine::Tensor &, transformer_engine::Tensor *, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 1073
              instantiation of "void transformer_engine::detail::quantize_gated_helper<IS_DGATED,ParamOP,ActOP,DActOP>(NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 59 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::gated_act_fn<ComputeType,Param,ActOP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, ActOP=transformer_engine::gelu]" at line 26 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu

  [42/43] /usr/local/cuda-12.4/bin/nvcc -forward-unknown-to-host-compiler -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/.. -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/include -I/usr/local/cuda-12.4/targets/x86_64-linux/include -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-arnq_5jt/build/cmake/string_headers -isystem=/usr/local/cuda-12.4/include -Wl,--version-script=/tmp/pip-req-build-arnq_5jt/transformer_engine/common/libtransformer_engine.version --expt-relaxed-constexpr -O3 --threads 1 -O3 -DNDEBUG --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcompiler=-fPIC -std=c++17 -MD -MT CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o -MF CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o.d -x cu -c /tmp/pip-req-build-arnq_5jt/transformer_engine/common/transpose/cast_transpose_fusion.cu -o CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o
  ninja: build stopped: subcommand failed.
  Traceback (most recent call last):
    File "/tmp/pip-req-build-arnq_5jt/build_tools/build_ext.py", line 89, in _build_cmake
      subprocess.run(command, cwd=build_dir, check=True)
    File "/root/anaconda3/envs/ms/lib/python3.10/subprocess.py", line 526, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['/usr/bin/cmake', '--build', '/tmp/pip-req-build-arnq_5jt/build/cmake', '--verbose', '--parallel']' returned non-zero exit status 1.

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 35, in <module>
    File "/tmp/pip-req-build-arnq_5jt/setup.py", line 179, in <module>
      setuptools.setup(
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
      return distutils.core.setup(**attrs)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
      return run_commands(dist)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
      dist.run_commands()
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/tmp/pip-req-build-arnq_5jt/setup.py", line 53, in run
      super().run()
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/wheel/_bdist_wheel.py", line 387, in run
      self.run_command("build")
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 132, in run
      self.run_command(cmd_name)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/tmp/pip-req-build-arnq_5jt/build_tools/build_ext.py", line 119, in run
      ext._build_cmake(
    File "/tmp/pip-req-build-arnq_5jt/build_tools/build_ext.py", line 91, in _build_cmake
      raise RuntimeError(f"Error when running CMake: {e}")
  RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '--build', '/tmp/pip-req-build-arnq_5jt/build/cmake', '--verbose', '--parallel']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for transformer_engine
Running setup.py clean for transformer_engine
Failed to build transformer_engine
ERROR: Failed to build installable wheels for some pyproject.toml based projects (transformer_engine)

最新版本

使用指令

SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])") && echo $SITE_PACKAGES &&
CUDNN_PATH=$SITE_PACKAGES/nvidia/cudnn CPLUS_INCLUDE_PATH=$SITE_PACKAGES/nvidia/cudnn/include
pip install git+https://gitee.com/zhangtianhonggitee/TransformerEngine.git

  [44/45] /usr/local/cuda-12.4/bin/nvcc -forward-unknown-to-host-compiler -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-216bf86l/transformer_engine/common/.. -I/tmp/pip-req-build-216bf86l/transformer_engine/common/include -I/usr/local/cuda-12.4/targets/x86_64-linux/include -I/tmp/pip-req-build-216bf86l/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-216bf86l/build/cmake/string_headers -isystem=/usr/local/cuda-12.4/include -Wl,--version-script=/tmp/pip-req-build-216bf86l/transformer_engine/common/libtransformer_engine.version --expt-relaxed-constexpr -O3 --threads 1 -O3 -DNDEBUG --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcompiler=-fPIC -std=c++17 -MD -MT CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o -MF CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o.d -x cu -c /tmp/pip-req-build-216bf86l/transformer_engine/common/transpose/cast_transpose_fusion.cu -o CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o
  ninja: build stopped: subcommand failed.
  Traceback (most recent call last):
    File "/tmp/pip-req-build-216bf86l/build_tools/build_ext.py", line 88, in _build_cmake
      subprocess.run(command, cwd=build_dir, check=True)
    File "/root/anaconda3/envs/ms/lib/python3.10/subprocess.py", line 526, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['/usr/bin/cmake', '--build', '/tmp/pip-req-build-216bf86l/build/cmake', '--verbose', '--parallel']' returned non-zero exit status 1.

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 35, in <module>
    File "/tmp/pip-req-build-216bf86l/setup.py", line 187, in <module>
      setuptools.setup(
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
      return distutils.core.setup(**attrs)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
      return run_commands(dist)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
      dist.run_commands()
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/tmp/pip-req-build-216bf86l/setup.py", line 51, in run
      super().run()
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/wheel/_bdist_wheel.py", line 387, in run
      self.run_command("build")
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 132, in run
      self.run_command(cmd_name)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/tmp/pip-req-build-216bf86l/build_tools/build_ext.py", line 120, in run
      ext._build_cmake(
    File "/tmp/pip-req-build-216bf86l/build_tools/build_ext.py", line 90, in _build_cmake
      raise RuntimeError(f"Error when running CMake: {e}")
  RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '--build', '/tmp/pip-req-build-216bf86l/build/cmake', '--verbose', '--parallel']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for transformer_engine
Running setup.py clean for transformer_engine
Building wheel for nvdlfw-inspect (pyproject.toml) ... done
Created wheel for nvdlfw-inspect: filename=nvdlfw_inspect-0.1.0-py3-none-any.whl size=30813 sha256=e151bc54367e558b8ecd48e00b6fe23645dd5a18be9c4bea0af5101809f4ee62
Stored in directory: /tmp/pip-ephem-wheel-cache-9m88qhu1/wheels/6f/b1/55/1a653c8ad54c41e4081205176009cc4cfc7f06ffc781fa6d0a
Successfully built nvdlfw-inspect
Failed to build transformer_engine
ERROR: Failed to build installable wheels for some pyproject.toml based projects (transformer_engine)

@Jintao-Huang
Copy link
Collaborator

建议使用镜像,这个包确实不太好安装

@zhangtianhong-1998
Copy link
Author

建议使用镜像,这个包确实不太好安装

有点心酸,谢谢

@zhangtianhong-1998
Copy link
Author

zhangtianhong-1998 commented Apr 30, 2025

建议使用镜像,这个包确实不太好安装

有个问题,目前镜像0.8.3好像不支持qwen3,直接升级吗
deepseed我看没有参数配置,zero3的参数和优化参数是直接默认卸载到Cpu吗

@Jintao-Huang
Copy link
Collaborator

直接升级swift就可以了

@Jintao-Huang
Copy link
Collaborator

@zhangansen
Copy link

@zhangtianhong-1998 我也安装失败了,请问这里的镜像指的是

@zhangtianhong-1998
Copy link
Author

@zhangtianhong-1998 我也安装失败了,请问这里的镜像指的是

docker镜像

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants