Skip to content

[Other General Issues]PaddleDetection使用TensorRT选项,对进行yolov3_mobilenet_v1_qat推理时报错 #5726

@Disciple7

Description

@Disciple7

PaddleDetection team appreciate any suggestion or problem you delivered~

Checklist:

  1. 查找历史相关issue寻求解答/I have searched related issues but cannot get the expected help.
  2. 翻阅FAQ /I have read the FAQ documentation but cannot get the expected help.

描述问题/Describe the bug

按照官方教程,在VOC数据集上训练了一个yolov3_mobilenet_v1_270e_voc模型,量化config使用默认的yolov3_mobilenet_v1_qat.yml。导出为paddleserving模型后,不开启--use_trt选项时可以正常运行和检测,但无加速和压缩效果。开启--use_trt时报此错误。

在PaddleServing群里咨询管理后,管理称这是模型问题,建议联系PaddleDetection。

另外,用benchmark脚本测试时也同样报TensorRT相关错误,无法测试trt_fp32、trt_fp16和trt_int8,但use_cpu、use_gpu可以测试。

复现/Reproduction

  1. 您使用的命令是?/What command or script did you run?
    paddleserving命令:
cd /home/ubuntu/lxd-storage/xzy/PaddleCV/PaddleDetection/inference_model/yolov3_mobilenet_v1_270e_qat_pdserving/yolov3_mobilenet_v1_qat
python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 --precision int8 --use_trt

benchmark测试命令:

bash deploy/benchmark/benchmark.sh ./inference_model/yolov3_mobilenet_v1_270e_voc_origin model
bash deploy/benchmark/benchmark_quant.sh ./inference_model/yolov3_mobilenet_v1_270e_qat/yolov3_mobilenet_v1_qat model
  1. 您是否更改过代码或配置文件?您是否理解您所更改的内容?还请您提供所更改的部分代码。/Did you make any modifications on the code or config? Did you understand what you have modified? Please provide the codes that you modified.

  1. 您使用的数据集是?/What dataset did you use?

VOC数据集

  1. 请提供您出现的报错信息及相关log。/Please provide the error messages or relevant log information.
    paddleserving报错信息和log:
/home/ubuntu/miniconda3/envs/paddle_env/lib/python3.7/runpy.py:125: RuntimeWarning: 'paddle_serving_server.serve' found in sys.modules after import of package 'paddle_serving_server', but prior to execution of 'paddle_serving_server.serve'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
Going to Run Comand
/home/ubuntu/miniconda3/envs/paddle_env/lib/python3.7/site-packages/paddle_serving_server/serving-gpu-101-0.8.3/serving -enable_model_toolkit -inferservice_path workdir_9393 -inferservice_file infer_service.prototxt -max_concurrency 0 -num_threads 4 -port 9393 -precision int8 -use_calib=False -reload_interval_s 10 -resource_path workdir_9393 -resource_file resource.prototxt -workflow_path workdir_9393 -workflow_file workflow.prototxt -bthread_concurrency 4 -max_body_size 536870912
I0100 00:00:00.000000 11926 op_repository.h:68] RAW: Succ regist op: GeneralDistKVInferOp
I0100 00:00:00.000000 11926 op_repository.h:68] RAW: Succ regist op: GeneralDistKVQuantInferOp
I0100 00:00:00.000000 11926 op_repository.h:68] RAW: Succ regist op: GeneralInferOp
I0100 00:00:00.000000 11926 op_repository.h:68] RAW: Succ regist op: GeneralReaderOp
I0100 00:00:00.000000 11926 op_repository.h:68] RAW: Succ regist op: GeneralRecOp
I0100 00:00:00.000000 11926 op_repository.h:68] RAW: Succ regist op: GeneralResponseOp
I0100 00:00:00.000000 11926 service_manager.h:79] RAW: Service[LoadGeneralModelService] insert successfully!
I0100 00:00:00.000000 11926 load_general_model_service.pb.h:333] RAW: Success regist service[LoadGeneralModelService][PN5baidu14paddle_serving9predictor26load_general_model_service27LoadGeneralModelServiceImplE]
I0100 00:00:00.000000 11926 service_manager.h:79] RAW: Service[GeneralModelService] insert successfully!
I0100 00:00:00.000000 11926 general_model_service.pb.h:1608] RAW: Success regist service[GeneralModelService][PN5baidu14paddle_serving9predictor13general_model23GeneralModelServiceImplE]
I0100 00:00:00.000000 11926 factory.h:155] RAW: Succ insert one factory, tag: PADDLE_INFER, base type N5baidu14paddle_serving9predictor11InferEngineE
W0100 00:00:00.000000 11926 paddle_engine.cpp:34] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine<PaddleInferenceEngine>->::baidu::paddle_serving::predictor::InferEngine, tag: PADDLE_INFER in macro!
I0415 14:52:29.661229 11929 analysis_predictor.cc:576] TensorRT subgraph engine is enabled
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [conv_affine_channel_fuse_pass]
--- Running IR pass [adaptive_pool2d_convert_global_pass]
--- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass]
--- Running IR pass [shuffle_channel_detect_pass]
--- Running IR pass [quant_conv2d_dequant_fuse_pass]
--- Running IR pass [delete_quant_dequant_op_pass]
I0415 14:52:29.881975 11929 fuse_pass_base.cc:57] ---  detected 47 subgraphs
--- Running IR pass [delete_quant_dequant_filter_op_pass]
I0415 14:52:29.944126 11929 fuse_pass_base.cc:57] ---  detected 47 subgraphs
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v2]
--- Running IR pass [multihead_matmul_fuse_pass_v3]
--- Running IR pass [skip_layernorm_fuse_pass]
--- Running IR pass [unsqueeze2_eltwise_fuse_pass]
--- Running IR pass [squeeze2_matmul_fuse_pass]
--- Running IR pass [reshape2_matmul_fuse_pass]
--- Running IR pass [flatten2_matmul_fuse_pass]
--- Running IR pass [map_matmul_v2_to_mul_pass]
--- Running IR pass [map_matmul_v2_to_matmul_pass]
--- Running IR pass [map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [conv_elementwise_add_fuse_pass]
--- Running IR pass [tensorrt_subgraph_pass]
I0415 14:52:30.001792 11929 tensorrt_subgraph_pass.cc:138] ---  detect a sub-graph with 145 nodes
I0415 14:52:30.034184 11929 tensorrt_subgraph_pass.cc:395] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
terminate called after throwing an instance of 'paddle::platform::EnforceNotMet'
  what():

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------

----------------------
Error Message Summary:
----------------------
UnimplementedError: no OpConverter for optype [nearest_interp_v2]
  [Hint: it should not be null.] (at /paddle/paddle/fluid/inference/tensorrt/convert/op_converter.h:142)

Aborted (core dumped)

Benchmark测试报错:

model_dir : ./inference_model/yolov3_mobilenet_v1_270e_qat/yolov3_mobilenet_v1_qat
img_dir: demo/fire_smoke_demo
model  ./inference_model/yolov3_mobilenet_v1_270e_qat/yolov3_mobilenet_v1_qat, run_mode: trt_int8
-----------  Running Arguments -----------
batch_size: 1
camera_id: -1
cpu_threads: 1
device: GPU
enable_mkldnn: False
image_dir: demo/fire_smoke_demo
image_file: None
model_dir: ./inference_model/yolov3_mobilenet_v1_270e_qat/yolov3_mobilenet_v1_qat
output_dir: output
reid_batch_size: 50
reid_model_dir: None
run_benchmark: True
run_mode: trt_int8
save_images: False
save_mot_txt_per_img: False
save_mot_txts: False
scaled: False
threshold: 0.5
trt_calib_mode: False
trt_max_shape: 1280
trt_min_shape: 1
trt_opt_shape: 640
use_dark: True
use_gpu: False
video_file: None
------------------------------------------
-----------  Model Configuration -----------
Model Arch: YOLO
Transform Order:
--transform op: Resize
--transform op: NormalizeImage
--transform op: Permute
--------------------------------------------
Traceback (most recent call last):
  File "deploy/python/infer.py", line 773, in <module>
    main()
  File "deploy/python/infer.py", line 726, in main
    enable_mkldnn=FLAGS.enable_mkldnn)
  File "deploy/python/infer.py", line 94, in __init__
    enable_mkldnn=enable_mkldnn)
  File "deploy/python/infer.py", line 563, in load_predictor
    predictor = create_predictor(config)
ValueError: (InvalidArgument) Pass tensorrt_subgraph_pass has not been registered. Please use the paddle inference library compiled with tensorrt or disable the tensorrt engine in inference configuration!
  [Hint: Expected Has(pass_type) == true, but received Has(pass_type):0 != true:1.] (at /paddle/paddle/fluid/framework/ir/pass.h:236)

环境/Environment

  1. 请提供您使用的Paddle和PaddleDetection的版本号/Please provide the version of Paddle and PaddleDetection you use:
    paddlepaddle-gpu=2.2.2.post101
    paddledet=2.3.0

  2. 如您在使用PaddleDetection的同时还在使用其他产品,如PaddleServing、PaddleInference等,请您提供其版本号/ Please provide the version of any other related tools/products used, such as the version of PaddleServing and etc:
    paddleslim=2.2.2
    paddle-serving-server-gpu=0.8.3.post101
    tensorrt=6.0.1.5

  3. 请提供您使用的操作系统信息,如Linux/Windows/MacOS /Please provide the OS information, e.g., Linux:
    Ubuntu 16.04

  4. 请问您使用的Python版本是?/ Please provide the version of Python you used.
    Python 3.7

  5. 请问您使用的CUDA/cuDNN的版本号是?/ Please provide the version of CUDA/cuDNN you used.
    CUDA 10.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    compressionModel compression includes quantization, pruning and distillation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions