Description
cuda 12.4
python 3.10
torch 2.6.0
参考了两个关闭issues仍未解决
pip install 'ms-swift'
pip install pybind11
注1 使用了gitee复制了仓库绕过网络限制
SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])") && echo $SITE_PACKAGES &&
CUDNN_PATH=$SITE_PACKAGES/nvidia/cudnn CPLUS_INCLUDE_PATH=$SITE_PACKAGES/nvidia/cudnn/include
pip install git+https://gitee.com/zhangtianhonggitee/TransformerEngine.git@stable
注2 pip install 'ms-swift[all]' -U
和源码完整版本pip install -e '.[all]'
会出现解包错误,所以只安装了pip install 'ms-swift'
Collecting binpacking (from ms-swift[all])
Using cached binpacking-1.5.2-py3-none-any.whl
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/fa/1c/d85aa7b120c09615c6d0f791fe581d42eb1fb062478fdc25a4e95dc88113/binpacking-1.5.1.tar.gz (9.4 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/83/08/5fb79fafc4c857d6712a24250b1fdba6aa3821b9492ccc239a05bf6ccfbf/binpacking-1.5.0.tar.gz (9.4 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/e8/74/fd61be713a1bfe72a7394bc4fe9cb5fc70d0aaf4a4b49a2e8152eed67a59/binpacking-1.4.5.tar.gz (8.9 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/53/b3/2796bc69236c624e46ba02b4e11c3c8d66193ce2124a03c11db190176bfe/binpacking-1.4.3.tar.gz (7.6 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/e6/de/5e565925472c7f9a987525cb6b49ac32a228fe203cd76c207d041683d40c/binpacking-1.4.2.tar.gz (7.6 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/dc/97/7e632f6dcd46c806160211d1e9a5cda1641cbb1a74fb5967024c5aa52ed5/binpacking-1.4.1.tar.gz (7.6 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/7a/9a/c336fe2f0546f17d945e6f9f6bc06b8b306d10750b20ec6e12715c32f7f8/binpacking-1.4.tar.gz (5.8 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/c9/fe/56782753922a195d332d419949f889c1d59cab7b1780db2351bd8b99501c/binpacking-1.3.tar.gz (5.6 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/9b/e4/a7ee63c0f201c5edb5817e36f964c571112fc00b23e8887bee4b41ac97f4/binpacking-1.2.tar.gz (5.4 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/51/d6/a26db6fd38fba493c3bfbd51e91b14a985bcc08dcf2900a9fd850f3b8507/binpacking-1.1.tar.gz (5.4 kB)
Preparing metadata (setup.py) ... done
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/d0/eb/7a7e6f4be7376260e97879cf51f1e3b9ff614f31e97355b3e26a587a2535/binpacking-1.0.tar.gz (5.1 kB)
Preparing metadata (setup.py) ... done
Collecting attrdict (from ms-swift[all])
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/ef/97/28fe7e68bc7adfce67d4339756e85e9fcf3c6fd7f0c0781695352b70472c/attrdict-2.0.1-py2.py3-none-any.whl (9.9 kB)
error: resolution-too-deep
× Dependency resolution exceeded maximum depth
╰─> Pip cannot resolve the current dependencies as the dependency graph is too complex for pip to solve efficiently.
hint: Try adding lower bounds to constrain your dependencies, for example: 'package>=2.0.0' instead of just 'package'.
stable版本
使用指令
SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])") && echo $SITE_PACKAGES &&
CUDNN_PATH=$SITE_PACKAGES/nvidia/cudnn CPLUS_INCLUDE_PATH=$SITE_PACKAGES/nvidia/cudnn/include
pip install git+https://gitee.com/zhangtianhonggitee/TransformerEngine.git@stable
instantiation of "void transformer_engine::gated_kernels::quantize_gated<IS_DGATED,ParamOP,ActOP,DActOP>(const transformer_engine::Tensor &, const transformer_engine::Tensor &, transformer_engine::Tensor *, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::relu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 1073
instantiation of "void transformer_engine::detail::quantize_gated_helper<IS_DGATED,ParamOP,ActOP,DActOP>(NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::relu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 59 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
instantiation of "void transformer_engine::gated_act_fn<ComputeType,Param,ActOP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, ActOP=transformer_engine::relu]" at line 26 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/relu.cu
[41/43] /usr/local/cuda-12.4/bin/nvcc -forward-unknown-to-host-compiler -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/.. -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/include -I/usr/local/cuda-12.4/targets/x86_64-linux/include -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-arnq_5jt/build/cmake/string_headers -isystem=/usr/local/cuda-12.4/include -Wl,--version-script=/tmp/pip-req-build-arnq_5jt/transformer_engine/common/libtransformer_engine.version --expt-relaxed-constexpr -O3 --threads 1 -O3 -DNDEBUG --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcompiler=-fPIC -std=c++17 -MD -MT CMakeFiles/transformer_engine.dir/activation/gelu.cu.o -MF CMakeFiles/transformer_engine.dir/activation/gelu.cu.o.d -x cu -c /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu -o CMakeFiles/transformer_engine.dir/activation/gelu.cu.o
/tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_kernels.cuh(930): warning #177-D: variable "input_shape" was declared but never referenced
const auto &input_shape = input.data.shape;
^
detected during:
instantiation of "void transformer_engine::fp8_quantize_arch_ge_100<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1209
instantiation of "void transformer_engine::fp8_quantize<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1255
instantiation of "void transformer_engine::detail::quantize_helper<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 36 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
instantiation of "void transformer_engine::act_fn<ComputeType,Param,OP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 13 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_gated_kernels.cuh(829): warning #177-D: variable "amax_ptr" was declared but never referenced
float *const amax_ptr = reinterpret_cast<float *>(output->amax.dptr);
^
detected during:
instantiation of "void transformer_engine::gated_kernels::quantize_gated<IS_DGATED,ParamOP,ActOP,DActOP>(const transformer_engine::Tensor &, const transformer_engine::Tensor &, transformer_engine::Tensor *, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 1073
instantiation of "void transformer_engine::detail::quantize_gated_helper<IS_DGATED,ParamOP,ActOP,DActOP>(NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 59 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
instantiation of "void transformer_engine::gated_act_fn<ComputeType,Param,ActOP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, ActOP=transformer_engine::gelu]" at line 26 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu
/tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_kernels.cuh(930): warning #177-D: variable "input_shape" was declared but never referenced
const auto &input_shape = input.data.shape;
^
detected during:
instantiation of "void transformer_engine::fp8_quantize_arch_ge_100<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1209
instantiation of "void transformer_engine::fp8_quantize<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1255
instantiation of "void transformer_engine::detail::quantize_helper<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 36 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
instantiation of "void transformer_engine::act_fn<ComputeType,Param,OP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 13 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_gated_kernels.cuh(829): warning #177-D: variable "amax_ptr" was declared but never referenced
float *const amax_ptr = reinterpret_cast<float *>(output->amax.dptr);
^
detected during:
instantiation of "void transformer_engine::gated_kernels::quantize_gated<IS_DGATED,ParamOP,ActOP,DActOP>(const transformer_engine::Tensor &, const transformer_engine::Tensor &, transformer_engine::Tensor *, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 1073
instantiation of "void transformer_engine::detail::quantize_gated_helper<IS_DGATED,ParamOP,ActOP,DActOP>(NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 59 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
instantiation of "void transformer_engine::gated_act_fn<ComputeType,Param,ActOP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, ActOP=transformer_engine::gelu]" at line 26 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu
/tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_kernels.cuh(930): warning #177-D: variable "input_shape" was declared but never referenced
const auto &input_shape = input.data.shape;
^
detected during:
instantiation of "void transformer_engine::fp8_quantize_arch_ge_100<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1209
instantiation of "void transformer_engine::fp8_quantize<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1255
instantiation of "void transformer_engine::detail::quantize_helper<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 36 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
instantiation of "void transformer_engine::act_fn<ComputeType,Param,OP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 13 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_gated_kernels.cuh(829): warning #177-D: variable "amax_ptr" was declared but never referenced
float *const amax_ptr = reinterpret_cast<float *>(output->amax.dptr);
^
detected during:
instantiation of "void transformer_engine::gated_kernels::quantize_gated<IS_DGATED,ParamOP,ActOP,DActOP>(const transformer_engine::Tensor &, const transformer_engine::Tensor &, transformer_engine::Tensor *, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 1073
instantiation of "void transformer_engine::detail::quantize_gated_helper<IS_DGATED,ParamOP,ActOP,DActOP>(NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 59 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
instantiation of "void transformer_engine::gated_act_fn<ComputeType,Param,ActOP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, ActOP=transformer_engine::gelu]" at line 26 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu
/tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_kernels.cuh(930): warning #177-D: variable "input_shape" was declared but never referenced
const auto &input_shape = input.data.shape;
^
detected during:
instantiation of "void transformer_engine::fp8_quantize_arch_ge_100<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1209
instantiation of "void transformer_engine::fp8_quantize<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(const transformer_engine::Tensor &, const transformer_engine::Tensor *, const transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, transformer_engine::Tensor *, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 1255
instantiation of "void transformer_engine::detail::quantize_helper<IS_DBIAS,IS_DACT,IS_ACT,ParamOP,OP>(NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DBIAS=false, IS_DACT=false, IS_ACT=true, ParamOP=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 36 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
instantiation of "void transformer_engine::act_fn<ComputeType,Param,OP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, OP=transformer_engine::gelu]" at line 13 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./../util/cast_gated_kernels.cuh(829): warning #177-D: variable "amax_ptr" was declared but never referenced
float *const amax_ptr = reinterpret_cast<float *>(output->amax.dptr);
^
detected during:
instantiation of "void transformer_engine::gated_kernels::quantize_gated<IS_DGATED,ParamOP,ActOP,DActOP>(const transformer_engine::Tensor &, const transformer_engine::Tensor &, transformer_engine::Tensor *, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 1073
instantiation of "void transformer_engine::detail::quantize_gated_helper<IS_DGATED,ParamOP,ActOP,DActOP>(NVTETensor, NVTETensor, NVTETensor, cudaStream_t) [with IS_DGATED=false, ParamOP=transformer_engine::Empty, ActOP=transformer_engine::gelu, DActOP=(float (*)(float, const transformer_engine::Empty &))nullptr]" at line 59 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/./activation_template.h
instantiation of "void transformer_engine::gated_act_fn<ComputeType,Param,ActOP>(NVTETensor, NVTETensor, cudaStream_t) [with ComputeType=transformer_engine::fp32, Param=transformer_engine::Empty, ActOP=transformer_engine::gelu]" at line 26 of /tmp/pip-req-build-arnq_5jt/transformer_engine/common/activation/gelu.cu
[42/43] /usr/local/cuda-12.4/bin/nvcc -forward-unknown-to-host-compiler -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/.. -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/include -I/usr/local/cuda-12.4/targets/x86_64-linux/include -I/tmp/pip-req-build-arnq_5jt/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-arnq_5jt/build/cmake/string_headers -isystem=/usr/local/cuda-12.4/include -Wl,--version-script=/tmp/pip-req-build-arnq_5jt/transformer_engine/common/libtransformer_engine.version --expt-relaxed-constexpr -O3 --threads 1 -O3 -DNDEBUG --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcompiler=-fPIC -std=c++17 -MD -MT CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o -MF CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o.d -x cu -c /tmp/pip-req-build-arnq_5jt/transformer_engine/common/transpose/cast_transpose_fusion.cu -o CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/tmp/pip-req-build-arnq_5jt/build_tools/build_ext.py", line 89, in _build_cmake
subprocess.run(command, cwd=build_dir, check=True)
File "/root/anaconda3/envs/ms/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/cmake', '--build', '/tmp/pip-req-build-arnq_5jt/build/cmake', '--verbose', '--parallel']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 35, in <module>
File "/tmp/pip-req-build-arnq_5jt/setup.py", line 179, in <module>
setuptools.setup(
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
return distutils.core.setup(**attrs)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
return run_commands(dist)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
dist.run_commands()
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/tmp/pip-req-build-arnq_5jt/setup.py", line 53, in run
super().run()
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/wheel/_bdist_wheel.py", line 387, in run
self.run_command("build")
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 132, in run
self.run_command(cmd_name)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/tmp/pip-req-build-arnq_5jt/build_tools/build_ext.py", line 119, in run
ext._build_cmake(
File "/tmp/pip-req-build-arnq_5jt/build_tools/build_ext.py", line 91, in _build_cmake
raise RuntimeError(f"Error when running CMake: {e}")
RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '--build', '/tmp/pip-req-build-arnq_5jt/build/cmake', '--verbose', '--parallel']' returned non-zero exit status 1.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for transformer_engine
Running setup.py clean for transformer_engine
Failed to build transformer_engine
ERROR: Failed to build installable wheels for some pyproject.toml based projects (transformer_engine)
最新版本
使用指令
SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])") && echo $SITE_PACKAGES &&
CUDNN_PATH=$SITE_PACKAGES/nvidia/cudnn CPLUS_INCLUDE_PATH=$SITE_PACKAGES/nvidia/cudnn/include
pip install git+https://gitee.com/zhangtianhonggitee/TransformerEngine.git
[44/45] /usr/local/cuda-12.4/bin/nvcc -forward-unknown-to-host-compiler -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-216bf86l/transformer_engine/common/.. -I/tmp/pip-req-build-216bf86l/transformer_engine/common/include -I/usr/local/cuda-12.4/targets/x86_64-linux/include -I/tmp/pip-req-build-216bf86l/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-216bf86l/build/cmake/string_headers -isystem=/usr/local/cuda-12.4/include -Wl,--version-script=/tmp/pip-req-build-216bf86l/transformer_engine/common/libtransformer_engine.version --expt-relaxed-constexpr -O3 --threads 1 -O3 -DNDEBUG --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcompiler=-fPIC -std=c++17 -MD -MT CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o -MF CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o.d -x cu -c /tmp/pip-req-build-216bf86l/transformer_engine/common/transpose/cast_transpose_fusion.cu -o CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/tmp/pip-req-build-216bf86l/build_tools/build_ext.py", line 88, in _build_cmake
subprocess.run(command, cwd=build_dir, check=True)
File "/root/anaconda3/envs/ms/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/cmake', '--build', '/tmp/pip-req-build-216bf86l/build/cmake', '--verbose', '--parallel']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 35, in <module>
File "/tmp/pip-req-build-216bf86l/setup.py", line 187, in <module>
setuptools.setup(
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
return distutils.core.setup(**attrs)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
return run_commands(dist)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
dist.run_commands()
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/tmp/pip-req-build-216bf86l/setup.py", line 51, in run
super().run()
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/wheel/_bdist_wheel.py", line 387, in run
self.run_command("build")
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 132, in run
self.run_command(cmd_name)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/root/anaconda3/envs/ms/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/tmp/pip-req-build-216bf86l/build_tools/build_ext.py", line 120, in run
ext._build_cmake(
File "/tmp/pip-req-build-216bf86l/build_tools/build_ext.py", line 90, in _build_cmake
raise RuntimeError(f"Error when running CMake: {e}")
RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '--build', '/tmp/pip-req-build-216bf86l/build/cmake', '--verbose', '--parallel']' returned non-zero exit status 1.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for transformer_engine
Running setup.py clean for transformer_engine
Building wheel for nvdlfw-inspect (pyproject.toml) ... done
Created wheel for nvdlfw-inspect: filename=nvdlfw_inspect-0.1.0-py3-none-any.whl size=30813 sha256=e151bc54367e558b8ecd48e00b6fe23645dd5a18be9c4bea0af5101809f4ee62
Stored in directory: /tmp/pip-ephem-wheel-cache-9m88qhu1/wheels/6f/b1/55/1a653c8ad54c41e4081205176009cc4cfc7f06ffc781fa6d0a
Successfully built nvdlfw-inspect
Failed to build transformer_engine
ERROR: Failed to build installable wheels for some pyproject.toml based projects (transformer_engine)