Skip to content

transformer_engine 安装失败 #4051

Closed
@zhangtianhong-1998

Description

@zhangtianhong-1998

cuda 12.4
python 3.10
torch 2.6.0
参考了两个关闭issues仍未解决
pip install 'ms-swift'
pip install pybind11

注1 使用了gitee复制了仓库绕过网络限制

SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])") && echo $SITE_PACKAGES &&
CUDNN_PATH=$SITE_PACKAGES/nvidia/cudnn CPLUS_INCLUDE_PATH=$SITE_PACKAGES/nvidia/cudnn/include
pip install git+https://gitee.com/zhangtianhonggitee/TransformerEngine.git@stable

注2 pip install 'ms-swift[all]' -U

和源码完整版本pip install -e '.[all]'
会出现解包错误,所以只安装了pip install 'ms-swift'

Collecting binpacking (from ms-swift[all])
Using cached binpacking-1.5.2-py3-none-any.whl
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/fa/1c/d85aa7b120c09615c6d0f791fe581d42eb1fb062478fdc25a4e95dc88113/binpacking-1.5.1.tar.gz (9.4 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/83/08/5fb79fafc4c857d6712a24250b1fdba6aa3821b9492ccc239a05bf6ccfbf/binpacking-1.5.0.tar.gz (9.4 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/e8/74/fd61be713a1bfe72a7394bc4fe9cb5fc70d0aaf4a4b49a2e8152eed67a59/binpacking-1.4.5.tar.gz (8.9 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/53/b3/2796bc69236c624e46ba02b4e11c3c8d66193ce2124a03c11db190176bfe/binpacking-1.4.3.tar.gz (7.6 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/e6/de/5e565925472c7f9a987525cb6b49ac32a228fe203cd76c207d041683d40c/binpacking-1.4.2.tar.gz (7.6 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/dc/97/7e632f6dcd46c806160211d1e9a5cda1641cbb1a74fb5967024c5aa52ed5/binpacking-1.4.1.tar.gz (7.6 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/7a/9a/c336fe2f0546f17d945e6f9f6bc06b8b306d10750b20ec6e12715c32f7f8/binpacking-1.4.tar.gz (5.8 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/c9/fe/56782753922a195d332d419949f889c1d59cab7b1780db2351bd8b99501c/binpacking-1.3.tar.gz (5.6 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/9b/e4/a7ee63c0f201c5edb5817e36f964c571112fc00b23e8887bee4b41ac97f4/binpacking-1.2.tar.gz (5.4 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/51/d6/a26db6fd38fba493c3bfbd51e91b14a985bcc08dcf2900a9fd850f3b8507/binpacking-1.1.tar.gz (5.4 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/d0/eb/7a7e6f4be7376260e97879cf51f1e3b9ff614f31e97355b3e26a587a2535/binpacking-1.0.tar.gz (5.1 kB)
Preparing metadata (setup.py) ... done
Collecting attrdict (from ms-swift[all])
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/ef/97/28fe7e68bc7adfce67d4339756e85e9fcf3c6fd7f0c0781695352b70472c/attrdict-2.0.1-py2.py3-none-any.whl (9.9 kB)
error: resolution-too-deep

× Dependency resolution exceeded maximum depth
╰─> Pip cannot resolve the current dependencies as the dependency graph is too complex for pip to solve efficiently.

hint: Try adding lower bounds to constrain your dependencies, for example: 'package>=2.0.0' instead of just 'package'.

stable版本

使用指令

SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])") && echo $SITE_PACKAGES &&
CUDNN_PATH=$SITE_PACKAGES/nvidia/cudnn CPLUS_INCLUDE_PATH=$SITE_PACKAGES/nvidia/cudnn/include
pip install git+https://gitee.com/zhangtianhonggitee/TransformerEngine.git@stable

              instantiation of "void transformer_engine::gated_kernels::quantize_gated<IS_DGATED,ParamOP,ActOP,DActOP>(const transformer_engine::Tensor &, const transformer_engine::Tensor &, transformer_engine::Tensor *, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::relu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 1073
              instantiation of "void transformer_engine::detail::quantize_gated_helper<IS_DGATED,ParamOP,ActOP,DActOP>(NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::relu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 59 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::gated_act_fn<ComputeType,Param,ActOP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, ActOP=transformer_engine::relu]" at line 26 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/relu.cu

  [41/43] /usr/local/cuda-12.4/bin/nvcc -forward-unknown-to-host-compiler -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/.. -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/include -I/usr/local/cuda-12.4/targets/x86_64-linux/include -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-arnq_5jt/build/cmake/string_headers -isystem=/usr/local/cuda-12.4/include -Wl,--version-script=/tmp/pip-req-build-arnq_5jt/transformer_engine/common/libtransformer_engine.version --expt-relaxed-constexpr -O3 --threads 1 -O3 -DNDEBUG --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcompiler=-fPIC -std=c++17 -MD -MT CMakeFiles/transformer_engine.dir/activation/gelu.cu.o -MF CMakeFiles/transformer_engine.dir/activation/gelu.cu.o.d -x cu -c /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu -o CMakeFiles/transformer_engine.dir/activation/gelu.cu.o
  /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_kernels.cuh(930): warning #177-D: variable "input_shape" was declared but never referenced
      const auto &input_shape = input.data.shape;
                  ^
            detected during:
              instantiation of "void transformer_engine::fp8_quantize_arch_ge_100<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1209
              instantiation of "void transformer_engine::fp8_quantize<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1255
              instantiation of "void transformer_engine::detail::quantize_helper<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 36 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::act_fn<ComputeType,Param,OP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 13 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu

  Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

  /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_gated_kernels.cuh(829): warning #177-D: variable "amax_ptr" was declared but never referenced
      float *const amax_ptr = reinterpret_cast<float *>(output->amax.dptr);
                   ^
            detected during:
              instantiation of "void transformer_engine::gated_kernels::quantize_gated<IS_DGATED,ParamOP,ActOP,DActOP>(const transformer_engine::Tensor &, const transformer_engine::Tensor &, transformer_engine::Tensor *, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 1073
              instantiation of "void transformer_engine::detail::quantize_gated_helper<IS_DGATED,ParamOP,ActOP,DActOP>(NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 59 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::gated_act_fn<ComputeType,Param,ActOP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, ActOP=transformer_engine::gelu]" at line 26 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu

  /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_kernels.cuh(930): warning #177-D: variable "input_shape" was declared but never referenced
      const auto &input_shape = input.data.shape;
                  ^
            detected during:
              instantiation of "void transformer_engine::fp8_quantize_arch_ge_100<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1209
              instantiation of "void transformer_engine::fp8_quantize<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1255
              instantiation of "void transformer_engine::detail::quantize_helper<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 36 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::act_fn<ComputeType,Param,OP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 13 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu

  Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

  /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_gated_kernels.cuh(829): warning #177-D: variable "amax_ptr" was declared but never referenced
      float *const amax_ptr = reinterpret_cast<float *>(output->amax.dptr);
                   ^
            detected during:
              instantiation of "void transformer_engine::gated_kernels::quantize_gated<IS_DGATED,ParamOP,ActOP,DActOP>(const transformer_engine::Tensor &, const transformer_engine::Tensor &, transformer_engine::Tensor *, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 1073
              instantiation of "void transformer_engine::detail::quantize_gated_helper<IS_DGATED,ParamOP,ActOP,DActOP>(NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 59 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::gated_act_fn<ComputeType,Param,ActOP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, ActOP=transformer_engine::gelu]" at line 26 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu

  /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_kernels.cuh(930): warning #177-D: variable "input_shape" was declared but never referenced
      const auto &input_shape = input.data.shape;
                  ^
            detected during:
              instantiation of "void transformer_engine::fp8_quantize_arch_ge_100<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1209
              instantiation of "void transformer_engine::fp8_quantize<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1255
              instantiation of "void transformer_engine::detail::quantize_helper<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 36 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::act_fn<ComputeType,Param,OP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 13 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu

  Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

  /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_gated_kernels.cuh(829): warning #177-D: variable "amax_ptr" was declared but never referenced
      float *const amax_ptr = reinterpret_cast<float *>(output->amax.dptr);
                   ^
            detected during:
              instantiation of "void transformer_engine::gated_kernels::quantize_gated<IS_DGATED,ParamOP,ActOP,DActOP>(const transformer_engine::Tensor &, const transformer_engine::Tensor &, transformer_engine::Tensor *, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 1073
              instantiation of "void transformer_engine::detail::quantize_gated_helper<IS_DGATED,ParamOP,ActOP,DActOP>(NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 59 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::gated_act_fn<ComputeType,Param,ActOP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, ActOP=transformer_engine::gelu]" at line 26 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu

  /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_kernels.cuh(930): warning #177-D: variable "input_shape" was declared but never referenced
      const auto &input_shape = input.data.shape;
                  ^
            detected during:
              instantiation of "void transformer_engine::fp8_quantize_arch_ge_100<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1209
              instantiation of "void transformer_engine::fp8_quantize<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1255
              instantiation of "void transformer_engine::detail::quantize_helper<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 36 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::act_fn<ComputeType,Param,OP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 13 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu

  Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

  /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_gated_kernels.cuh(829): warning #177-D: variable "amax_ptr" was declared but never referenced
      float *const amax_ptr = reinterpret_cast<float *>(output->amax.dptr);
                   ^
            detected during:
              instantiation of "void transformer_engine::gated_kernels::quantize_gated<IS_DGATED,ParamOP,ActOP,DActOP>(const transformer_engine::Tensor &, const transformer_engine::Tensor &, transformer_engine::Tensor *, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 1073
              instantiation of "void transformer_engine::detail::quantize_gated_helper<IS_DGATED,ParamOP,ActOP,DActOP>(NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 59 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
              instantiation of "void transformer_engine::gated_act_fn<ComputeType,Param,ActOP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, ActOP=transformer_engine::gelu]" at line 26 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu

  [42/43] /usr/local/cuda-12.4/bin/nvcc -forward-unknown-to-host-compiler -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/.. -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/include -I/usr/local/cuda-12.4/targets/x86_64-linux/include -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-arnq_5jt/build/cmake/string_headers -isystem=/usr/local/cuda-12.4/include -Wl,--version-script=/tmp/pip-req-build-arnq_5jt/transformer_engine/common/libtransformer_engine.version --expt-relaxed-constexpr -O3 --threads 1 -O3 -DNDEBUG --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcompiler=-fPIC -std=c++17 -MD -MT CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o -MF CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o.d -x cu -c /tmp/pip-req-build-arnq_5jt/transformer_engine/common/transpose/cast_transpose_fusion.cu -o CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o
  ninja: build stopped: subcommand failed.
  Traceback (most recent call last):
    File "/tmp/pip-req-build-arnq_5jt/build_tools/build_ext.py", line 89, in _build_cmake
      subprocess.run(command, cwd=build_dir, check=True)
    File "/root/anaconda3/envs/ms/lib/python3.10/subprocess.py", line 526, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['/usr/bin/cmake', '--build', '/tmp/pip-req-build-arnq_5jt/build/cmake', '--verbose', '--parallel']' returned non-zero exit status 1.

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 35, in <module>
    File "/tmp/pip-req-build-arnq_5jt/setup.py", line 179, in <module>
      setuptools.setup(
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
      return distutils.core.setup(**attrs)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
      return run_commands(dist)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
      dist.run_commands()
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/tmp/pip-req-build-arnq_5jt/setup.py", line 53, in run
      super().run()
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/wheel/_bdist_wheel.py", line 387, in run
      self.run_command("build")
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 132, in run
      self.run_command(cmd_name)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/tmp/pip-req-build-arnq_5jt/build_tools/build_ext.py", line 119, in run
      ext._build_cmake(
    File "/tmp/pip-req-build-arnq_5jt/build_tools/build_ext.py", line 91, in _build_cmake
      raise RuntimeError(f"Error when running CMake: {e}")
  RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '--build', '/tmp/pip-req-build-arnq_5jt/build/cmake', '--verbose', '--parallel']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for transformer_engine
Running setup.py clean for transformer_engine
Failed to build transformer_engine
ERROR: Failed to build installable wheels for some pyproject.toml based projects (transformer_engine)

最新版本

使用指令

SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])") && echo $SITE_PACKAGES &&
CUDNN_PATH=$SITE_PACKAGES/nvidia/cudnn CPLUS_INCLUDE_PATH=$SITE_PACKAGES/nvidia/cudnn/include
pip install git+https://gitee.com/zhangtianhonggitee/TransformerEngine.git

  [44/45] /usr/local/cuda-12.4/bin/nvcc -forward-unknown-to-host-compiler -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-216bf86l/transformer_engine/common/.. -I/tmp/pip-req-build-216bf86l/transformer_engine/common/include -I/usr/local/cuda-12.4/targets/x86_64-linux/include -I/tmp/pip-req-build-216bf86l/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-216bf86l/build/cmake/string_headers -isystem=/usr/local/cuda-12.4/include -Wl,--version-script=/tmp/pip-req-build-216bf86l/transformer_engine/common/libtransformer_engine.version --expt-relaxed-constexpr -O3 --threads 1 -O3 -DNDEBUG --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcompiler=-fPIC -std=c++17 -MD -MT CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o -MF CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o.d -x cu -c /tmp/pip-req-build-216bf86l/transformer_engine/common/transpose/cast_transpose_fusion.cu -o CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o
  ninja: build stopped: subcommand failed.
  Traceback (most recent call last):
    File "/tmp/pip-req-build-216bf86l/build_tools/build_ext.py", line 88, in _build_cmake
      subprocess.run(command, cwd=build_dir, check=True)
    File "/root/anaconda3/envs/ms/lib/python3.10/subprocess.py", line 526, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['/usr/bin/cmake', '--build', '/tmp/pip-req-build-216bf86l/build/cmake', '--verbose', '--parallel']' returned non-zero exit status 1.

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 35, in <module>
    File "/tmp/pip-req-build-216bf86l/setup.py", line 187, in <module>
      setuptools.setup(
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
      return distutils.core.setup(**attrs)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
      return run_commands(dist)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
      dist.run_commands()
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/tmp/pip-req-build-216bf86l/setup.py", line 51, in run
      super().run()
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/wheel/_bdist_wheel.py", line 387, in run
      self.run_command("build")
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 132, in run
      self.run_command(cmd_name)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/tmp/pip-req-build-216bf86l/build_tools/build_ext.py", line 120, in run
      ext._build_cmake(
    File "/tmp/pip-req-build-216bf86l/build_tools/build_ext.py", line 90, in _build_cmake
      raise RuntimeError(f"Error when running CMake: {e}")
  RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '--build', '/tmp/pip-req-build-216bf86l/build/cmake', '--verbose', '--parallel']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for transformer_engine
Running setup.py clean for transformer_engine
Building wheel for nvdlfw-inspect (pyproject.toml) ... done
Created wheel for nvdlfw-inspect: filename=nvdlfw_inspect-0.1.0-py3-none-any.whl size=30813 sha256=e151bc54367e558b8ecd48e00b6fe23645dd5a18be9c4bea0af5101809f4ee62
Stored in directory: /tmp/pip-ephem-wheel-cache-9m88qhu1/wheels/6f/b1/55/1a653c8ad54c41e4081205176009cc4cfc7f06ffc781fa6d0a
Successfully built nvdlfw-inspect
Failed to build transformer_engine
ERROR: Failed to build installable wheels for some pyproject.toml based projects (transformer_engine)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions