Skip to content

What is the relationship between whether or not sparse implementation is picked for TRT inference? #4056

@laurenlong

Description

@laurenlong

Description

I used apex's ASP to accomplish the N:M sparsity. I get the following result when I use ./trtexec --onnx=sparse.onnx --saveEngine=sparse.trt --sparsity=enable --fp16 --verbose to generate the engine:

[08/05/2024-12:06:54] [I] [TRT] (Sparsity) Layers eligible for sparse math: MatMul_401, Conv_619 + Relu_620, Conv_622 + Relu_623, Conv_235 + Relu_236, Conv_626, Conv_239, Conv_621 + Add_627 + Relu_628, Conv_234 + Add_240 + Relu_241, Conv_243 + Relu_244, Conv_630 + Relu_631, Conv_247, Conv_634, Conv_242 + Add_248 + Relu_249, Conv_629 + Add_635 + Relu_636, Conv_637 + Relu_638, Conv_250 + Relu_251, Conv_641 + Add_642 + Relu_643, Conv_254 + Add_255 + Relu_256, Conv_258 + Relu_259, Conv_644 + Relu_645, Conv_262, Conv_257 + Add_263 + Relu_264, Conv_648 + Add_649 + Relu_650, Conv_652 + Relu_653, Conv_265 + Relu_266, Conv_656, Conv_651 + Add_657 + Relu_658, Conv_269 + Add_270 + Relu_271, Conv_272 + Relu_273, Conv_659 + Relu_660, Conv_276 + Add_277 + Relu_278, Conv_663 + Add_664 + Relu_665, Conv_666 + Relu_667, Conv_279 + Relu_280, Conv_670 + Add_671 + Relu_672, Conv_283 + Add_284 + Relu_285, Conv_286 + Relu_287, Conv_673 + Relu_674, Conv_290 + Add_291 + Relu_292, Conv_677 + Add_678 + Relu_679, Conv_680 + Relu_681, Conv_293 + Relu_294, Conv_684 + Add_685 + Relu_686, Conv_297 + Add_298 + Relu_299, Conv_300 + Relu_301, Conv_687 + Relu_688, Conv_304 + Add_305 + Relu_306, Conv_691 + Add_692 + Relu_693, Conv_694 + Relu_695, Conv_308 + Relu_309, Conv_312, Conv_698 + Add_699 + Relu_700, Conv_307 + Add_313 + Relu_314, Conv_315 + Relu_316, Conv_702 + Relu_703, Conv_706, Conv_319 + Add_320 + Relu_321, Conv_701 + Add_707 + Relu_708, Conv_709 + Relu_710, Conv_322 + Relu_323, Conv_713 + Add_714 + Relu_715, Conv_326 + Add_327 + Relu_328, Conv_329 + Relu_330, Conv_716 + Relu_717, Conv_333 + Add_334 + Relu_335, Conv_720 + Add_721 + Relu_722, Conv_723 + Relu_724, Conv_336 + Relu_337, Conv_727 + Add_728 + Relu_729, Conv_340 + Add_341 + Relu_342, Conv_343 + Relu_344, Conv_730 + Relu_731, Conv_347 + Add_348 + Relu_349, Conv_734 + Add_735 + Relu_736, Conv_737, Conv_350 + Relu_351, Conv_739 + Add_740, Conv_742 + Add_743, Conv_354 + Add_355 + Relu_356, Conv_357 + Relu_358, Conv_361 + Add_362 + Relu_363, Conv_364 + Relu_365, Conv_368 + Add_369 + Relu_370, Conv_371 + Relu_372, Conv_375 + Add_376 + Relu_377, Conv_378 + Relu_379, Conv_382 + Add_383 + Relu_384, Conv_385 + Relu_386, Conv_389 + Add_390 + Relu_391, Conv_392, Conv_394 + Add_395, Conv_397 + Add_398, Conv_472 || Conv_443 || Conv_438, MatMul_514, MatMul_513, MatMul_592, Conv_607 + Relu_608, Conv_609 + Add_610, Conv_615, Conv_612 + Relu_613, Conv_614 + Add_617 + Relu_618, Conv_752 + Relu_753, Conv_754, Conv_823 || Conv_813 || Conv_809 || Conv_799 || Conv_789 || Conv_785 || Conv_775 || Conv_765, Conv_761 || Conv_825 || Conv_821 || Conv_819 || Conv_817 || Conv_815 || Conv_811 || Conv_807, Conv_805 || Conv_803 || Conv_801 || Conv_797 || Conv_795 || Conv_793 || Conv_791 || Conv_787, Conv_783 || Conv_781 || Conv_779 || Conv_777 || Conv_773 || Conv_771 || Conv_769 || Conv_767, Conv_763 || Conv_759 || Conv_757 || Conv_755
[08/05/2024-12:06:54] [I] [TRT] (Sparsity) TRT inference plan picked sparse implementation for layers: Conv_626, Conv_243 + Relu_244, Conv_630 + Relu_631, Conv_247, Conv_634, Conv_637 + Relu_638, Conv_250 + Relu_251, Conv_254 + Add_255 + Relu_256, Conv_258 + Relu_259, Conv_644 + Relu_645, Conv_262, Conv_257 + Add_263 + Relu_264, Conv_652 + Relu_653, Conv_265 + Relu_266, Conv_656, Conv_269 + Add_270 + Relu_271, Conv_272 + Relu_273, Conv_659 + Relu_660, Conv_276 + Add_277 + Relu_278, Conv_666 + Relu_667, Conv_279 + Relu_280, Conv_283 + Add_284 + Relu_285, Conv_286 + Relu_287, Conv_673 + Relu_674, Conv_290 + Add_291 + Relu_292, Conv_680 + Relu_681, Conv_293 + Relu_294, Conv_297 + Add_298 + Relu_299, Conv_300 + Relu_301, Conv_687 + Relu_688, Conv_304 + Add_305 + Relu_306, Conv_694 + Relu_695, Conv_308 + Relu_309, Conv_312, Conv_307 + Add_313 + Relu_314, Conv_315 + Relu_316, Conv_702 + Relu_703, Conv_706, Conv_319 + Add_320 + Relu_321, Conv_709 + Relu_710, Conv_322 + Relu_323, Conv_326 + Add_327 + Relu_328, Conv_329 + Relu_330, Conv_716 + Relu_717, Conv_333 + Add_334 + Relu_335, Conv_723 + Relu_724, Conv_336 + Relu_337, Conv_340 + Add_341 + Relu_342, Conv_343 + Relu_344, Conv_730 + Relu_731, Conv_347 + Add_348 + Relu_349, Conv_737, Conv_350 + Relu_351, Conv_354 + Add_355 + Relu_356, Conv_357 + Relu_358, Conv_361 + Add_362 + Relu_363, Conv_364 + Relu_365, Conv_368 + Add_369 + Relu_370, Conv_371 + Relu_372, Conv_375 + Add_376 + Relu_377, Conv_378 + Relu_379, Conv_382 + Add_383 + Relu_384, Conv_385 + Relu_386, Conv_389 + Add_390 + Relu_391, Conv_392, Conv_394 + Add_395, Conv_397 + Add_398, MatMul_514, MatMul_513, MatMul_592, Conv_607 + Relu_608, Conv_609 + Add_610, Conv_615, Conv_752 + Relu_753, Conv_823 || Conv_813 || Conv_809 || Conv_799 || Conv_789 || Conv_785 || Conv_775 || Conv_765, Conv_761 || Conv_825 || Conv_821 || Conv_819 || Conv_817 || Conv_815 || Conv_811 || Conv_807, Conv_805 || Conv_803 || Conv_801 || Conv_797 || Conv_795 || Conv_793 || Conv_791 || Conv_787, Conv_783 || Conv_781 || Conv_779 || Conv_777 || Conv_773 || Conv_771 || Conv_769 || Conv_767, Conv_763 || Conv_759 || Conv_757 || Conv_755

I marked Layers that are eligible for sparse math but not TRT inference plan picked sparse implementation. The red words in the picture below are what I've marked.
image
I saw a answer in a previous issue saying that convolutional layers with few channels or convolutional kernel size will not use a sparse implementation.
But I observed that many convolutional layers in my model with a relatively small number of channels picked sparse implementation, while many convolutional layers with a relatively larger number of channels did not pick sparse implementation.

For example, Conv_663 + Add_664 + Relu_665 that has Conv layer with
[288,288,1,1] shape is not picked sparse implementation, while Conv_276 + Add_277 + Relu_278 has that has Conv layer with [160,160,1,1] shape is picked sparse implementation.
So is there any other reason that affect whether or not picked sparse implementation?
image
image

Environment

TensorRT Version: 8.5.2.2

NVIDIA GPU:Orin

Operating System: Linux

Python Version (if applicable): 3.8.10

Steps To Reproduce

Commands or scripts:
./trtexec --onnx=sparse.onnx --saveEngine=sparse.trt --sparsity=enable --fp16 --verbose

Metadata

Metadata

Assignees

Labels

InvestigatingIssue is under investigation by TensorRT devsModule:Engine BuildIssues with building TensorRT enginestriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions