-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Description
I used apex's ASP to accomplish the N:M sparsity. I get the following result when I use ./trtexec --onnx=sparse.onnx --saveEngine=sparse.trt --sparsity=enable --fp16 --verbose
to generate the engine:
[08/05/2024-12:06:54] [I] [TRT] (Sparsity) Layers eligible for sparse math: MatMul_401, Conv_619 + Relu_620, Conv_622 + Relu_623, Conv_235 + Relu_236, Conv_626, Conv_239, Conv_621 + Add_627 + Relu_628, Conv_234 + Add_240 + Relu_241, Conv_243 + Relu_244, Conv_630 + Relu_631, Conv_247, Conv_634, Conv_242 + Add_248 + Relu_249, Conv_629 + Add_635 + Relu_636, Conv_637 + Relu_638, Conv_250 + Relu_251, Conv_641 + Add_642 + Relu_643, Conv_254 + Add_255 + Relu_256, Conv_258 + Relu_259, Conv_644 + Relu_645, Conv_262, Conv_257 + Add_263 + Relu_264, Conv_648 + Add_649 + Relu_650, Conv_652 + Relu_653, Conv_265 + Relu_266, Conv_656, Conv_651 + Add_657 + Relu_658, Conv_269 + Add_270 + Relu_271, Conv_272 + Relu_273, Conv_659 + Relu_660, Conv_276 + Add_277 + Relu_278, Conv_663 + Add_664 + Relu_665, Conv_666 + Relu_667, Conv_279 + Relu_280, Conv_670 + Add_671 + Relu_672, Conv_283 + Add_284 + Relu_285, Conv_286 + Relu_287, Conv_673 + Relu_674, Conv_290 + Add_291 + Relu_292, Conv_677 + Add_678 + Relu_679, Conv_680 + Relu_681, Conv_293 + Relu_294, Conv_684 + Add_685 + Relu_686, Conv_297 + Add_298 + Relu_299, Conv_300 + Relu_301, Conv_687 + Relu_688, Conv_304 + Add_305 + Relu_306, Conv_691 + Add_692 + Relu_693, Conv_694 + Relu_695, Conv_308 + Relu_309, Conv_312, Conv_698 + Add_699 + Relu_700, Conv_307 + Add_313 + Relu_314, Conv_315 + Relu_316, Conv_702 + Relu_703, Conv_706, Conv_319 + Add_320 + Relu_321, Conv_701 + Add_707 + Relu_708, Conv_709 + Relu_710, Conv_322 + Relu_323, Conv_713 + Add_714 + Relu_715, Conv_326 + Add_327 + Relu_328, Conv_329 + Relu_330, Conv_716 + Relu_717, Conv_333 + Add_334 + Relu_335, Conv_720 + Add_721 + Relu_722, Conv_723 + Relu_724, Conv_336 + Relu_337, Conv_727 + Add_728 + Relu_729, Conv_340 + Add_341 + Relu_342, Conv_343 + Relu_344, Conv_730 + Relu_731, Conv_347 + Add_348 + Relu_349, Conv_734 + Add_735 + Relu_736, Conv_737, Conv_350 + Relu_351, Conv_739 + Add_740, Conv_742 + Add_743, Conv_354 + Add_355 + Relu_356, Conv_357 + Relu_358, Conv_361 + Add_362 + Relu_363, Conv_364 + Relu_365, Conv_368 + Add_369 + Relu_370, Conv_371 + Relu_372, Conv_375 + Add_376 + Relu_377, Conv_378 + Relu_379, Conv_382 + Add_383 + Relu_384, Conv_385 + Relu_386, Conv_389 + Add_390 + Relu_391, Conv_392, Conv_394 + Add_395, Conv_397 + Add_398, Conv_472 || Conv_443 || Conv_438, MatMul_514, MatMul_513, MatMul_592, Conv_607 + Relu_608, Conv_609 + Add_610, Conv_615, Conv_612 + Relu_613, Conv_614 + Add_617 + Relu_618, Conv_752 + Relu_753, Conv_754, Conv_823 || Conv_813 || Conv_809 || Conv_799 || Conv_789 || Conv_785 || Conv_775 || Conv_765, Conv_761 || Conv_825 || Conv_821 || Conv_819 || Conv_817 || Conv_815 || Conv_811 || Conv_807, Conv_805 || Conv_803 || Conv_801 || Conv_797 || Conv_795 || Conv_793 || Conv_791 || Conv_787, Conv_783 || Conv_781 || Conv_779 || Conv_777 || Conv_773 || Conv_771 || Conv_769 || Conv_767, Conv_763 || Conv_759 || Conv_757 || Conv_755
[08/05/2024-12:06:54] [I] [TRT] (Sparsity) TRT inference plan picked sparse implementation for layers: Conv_626, Conv_243 + Relu_244, Conv_630 + Relu_631, Conv_247, Conv_634, Conv_637 + Relu_638, Conv_250 + Relu_251, Conv_254 + Add_255 + Relu_256, Conv_258 + Relu_259, Conv_644 + Relu_645, Conv_262, Conv_257 + Add_263 + Relu_264, Conv_652 + Relu_653, Conv_265 + Relu_266, Conv_656, Conv_269 + Add_270 + Relu_271, Conv_272 + Relu_273, Conv_659 + Relu_660, Conv_276 + Add_277 + Relu_278, Conv_666 + Relu_667, Conv_279 + Relu_280, Conv_283 + Add_284 + Relu_285, Conv_286 + Relu_287, Conv_673 + Relu_674, Conv_290 + Add_291 + Relu_292, Conv_680 + Relu_681, Conv_293 + Relu_294, Conv_297 + Add_298 + Relu_299, Conv_300 + Relu_301, Conv_687 + Relu_688, Conv_304 + Add_305 + Relu_306, Conv_694 + Relu_695, Conv_308 + Relu_309, Conv_312, Conv_307 + Add_313 + Relu_314, Conv_315 + Relu_316, Conv_702 + Relu_703, Conv_706, Conv_319 + Add_320 + Relu_321, Conv_709 + Relu_710, Conv_322 + Relu_323, Conv_326 + Add_327 + Relu_328, Conv_329 + Relu_330, Conv_716 + Relu_717, Conv_333 + Add_334 + Relu_335, Conv_723 + Relu_724, Conv_336 + Relu_337, Conv_340 + Add_341 + Relu_342, Conv_343 + Relu_344, Conv_730 + Relu_731, Conv_347 + Add_348 + Relu_349, Conv_737, Conv_350 + Relu_351, Conv_354 + Add_355 + Relu_356, Conv_357 + Relu_358, Conv_361 + Add_362 + Relu_363, Conv_364 + Relu_365, Conv_368 + Add_369 + Relu_370, Conv_371 + Relu_372, Conv_375 + Add_376 + Relu_377, Conv_378 + Relu_379, Conv_382 + Add_383 + Relu_384, Conv_385 + Relu_386, Conv_389 + Add_390 + Relu_391, Conv_392, Conv_394 + Add_395, Conv_397 + Add_398, MatMul_514, MatMul_513, MatMul_592, Conv_607 + Relu_608, Conv_609 + Add_610, Conv_615, Conv_752 + Relu_753, Conv_823 || Conv_813 || Conv_809 || Conv_799 || Conv_789 || Conv_785 || Conv_775 || Conv_765, Conv_761 || Conv_825 || Conv_821 || Conv_819 || Conv_817 || Conv_815 || Conv_811 || Conv_807, Conv_805 || Conv_803 || Conv_801 || Conv_797 || Conv_795 || Conv_793 || Conv_791 || Conv_787, Conv_783 || Conv_781 || Conv_779 || Conv_777 || Conv_773 || Conv_771 || Conv_769 || Conv_767, Conv_763 || Conv_759 || Conv_757 || Conv_755
I marked Layers that are eligible for sparse math but not TRT inference plan picked sparse implementation. The red words in the picture below are what I've marked.
I saw a answer in a previous issue saying that convolutional layers with few channels or convolutional kernel size will not use a sparse implementation.
But I observed that many convolutional layers in my model with a relatively small number of channels picked sparse implementation, while many convolutional layers with a relatively larger number of channels did not pick sparse implementation.
For example, Conv_663 + Add_664 + Relu_665 that has Conv layer with
[288,288,1,1] shape is not picked sparse implementation, while Conv_276 + Add_277 + Relu_278 has that has Conv layer with [160,160,1,1] shape is picked sparse implementation.
So is there any other reason that affect whether or not picked sparse implementation?
Environment
TensorRT Version: 8.5.2.2
NVIDIA GPU:Orin
Operating System: Linux
Python Version (if applicable): 3.8.10
Steps To Reproduce
Commands or scripts:
./trtexec --onnx=sparse.onnx --saveEngine=sparse.trt --sparsity=enable --fp16 --verbose