yolov5: upgrade to v7.0 and support instance segmentation (wang-xinyu#1177)

wang-xinyu · web-flow · commit fbad2261dc68 · 2022-12-18T23:15:07.000+08:00
* add seg

* yolov5 seg

* update wts and readme

* add todo

* update readme
diff --git a/README.md b/README.md
@@ -15,6 +15,7 @@ The basic workflow of TensorRTx is:
 
 ## News
 
+- `18 Dec 2022`. [YOLOv5](./yolov5) upgrade to support v7.0, including instance segmention.
 - `12 Dec 2022`. [East-Face](https://github.com/East-Face): [UNet](./unet) upgrade to support v3.0 of [Pytorch-UNet](https://github.com/milesial/Pytorch-UNet).
 - `26 Oct 2022`. [ausk](https://github.com/ausk): YoloP(You Only Look Once for Panopitic Driving Perception).
 - `19 Sep 2022`. [QIANXUNZDL123](https://github.com/QIANXUNZDL123) and [lindsayshuo](https://github.com/lindsayshuo): YOLOv7.
@@ -29,7 +30,6 @@ The basic workflow of TensorRTx is:
 - `18 Oct 2021`. [xupengao](https://github.com/xupengao): YOLOv5 updated to v6.0, supporting n/s/m/l/x/n6/s6/m6/l6/x6.
 - `31 Aug 2021`. [FamousDirector](https://github.com/FamousDirector): update retinaface to support TensorRT 8.0.
 - `27 Aug 2021`. [HaiyangPeng](https://github.com/HaiyangPeng): add a python wrapper for hrnet segmentation.
-- `1 Jul 2021`. [freedenS](https://github.com/freedenS): DE⫶TR: End-to-End Object Detection with Transformers. First Transformer model!
 
 ## Tutorials
 
@@ -75,7 +75,7 @@ Following models are implemented.
 |[yolov3](./yolov3)| darknet-53, weights and pytorch implementation from [ultralytics/yolov3](https://github.com/ultralytics/yolov3) |
 |[yolov3-spp](./yolov3-spp)| darknet-53, weights and pytorch implementation from [ultralytics/yolov3](https://github.com/ultralytics/yolov3) |
 |[yolov4](./yolov4)| CSPDarknet53, weights from [AlexeyAB/darknet](https://github.com/AlexeyAB/darknet#pre-trained-models), pytorch implementation from [ultralytics/yolov3](https://github.com/ultralytics/yolov3) |
-|[yolov5](./yolov5)| yolov5 v1.0-v6.2, pytorch implementation from [ultralytics/yolov5](https://github.com/ultralytics/yolov5) |
+|[yolov5](./yolov5)| yolov5 v1.0-v7.0 of [ultralytics/yolov5](https://github.com/ultralytics/yolov5), detection, classification and instance segmentation |
 |[yolov7](./yolov7)| yolov7 v0.1, pytorch implementation from [WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7) |
 |[yolop](./yolop)| yolop, pytorch implementation from [hustvl/YOLOP](https://github.com/hustvl/YOLOP) |
 |[retinaface](./retinaface)| resnet50 and mobilnet0.25, weights from [biubug6/Pytorch_Retinaface](https://github.com/biubug6/Pytorch_Retinaface) |
diff --git a/yolov5/CMakeLists.txt b/yolov5/CMakeLists.txt
@@ -20,6 +20,7 @@ include_directories(${PROJECT_SOURCE_DIR}/include)
 include_directories(/usr/local/cuda/include)
 link_directories(/usr/local/cuda/lib64)
 # tensorrt
+# TODO(Call for PR): make TRT path configurable from command line
 include_directories(/usr/include/x86_64-linux-gnu/)
 link_directories(/usr/lib/x86_64-linux-gnu/)
 
@@ -44,8 +45,14 @@ target_link_libraries(yolov5-cls cudart)
 target_link_libraries(yolov5-cls myplugins)
 target_link_libraries(yolov5-cls ${OpenCV_LIBS})
 
+cuda_add_executable(yolov5-seg calibrator.cpp yolov5_seg.cpp preprocess.cu)
+
+target_link_libraries(yolov5-seg nvinfer)
+target_link_libraries(yolov5-seg cudart)
+target_link_libraries(yolov5-seg myplugins)
+target_link_libraries(yolov5-seg ${OpenCV_LIBS})
+
 if(UNIX)
 add_definitions(-O2 -pthread)
 endif(UNIX)
 
-
diff --git a/yolov5/README.md b/yolov5/README.md
@@ -33,8 +33,9 @@ TensorRTx inference code base for [ultralytics/yolov5](https://github.com/ultral
 
 ## Different versions of yolov5
 
-Currently, we support yolov5 v1.0, v2.0, v3.0, v3.1, v4.0, v5.0, v6.0, v6.2
+Currently, we support yolov5 v1.0, v2.0, v3.0, v3.1, v4.0, v5.0, v6.0, v6.2, v7.0
 
+- For yolov5 v7.0, download .pt from [yolov5 release v7.0](https://github.com/ultralytics/yolov5/releases/tag/v7.0), `git clone -b v7.0 https://github.com/ultralytics/yolov5.git` and `git clone -b yolov5-v7.0 https://github.com/wang-xinyu/tensorrtx.git`, then follow how-to-run in [tensorrtx/yolov5-v7.0](https://github.com/wang-xinyu/tensorrtx/tree/yolov5-v7.0/yolov5)
 - For yolov5 v6.2, download .pt from [yolov5 release v6.2](https://github.com/ultralytics/yolov5/releases/tag/v6.2), `git clone -b v6.2 https://github.com/ultralytics/yolov5.git` and `git clone -b yolov5-v6.2 https://github.com/wang-xinyu/tensorrtx.git`, then follow how-to-run in [tensorrtx/yolov5-v6.2](https://github.com/wang-xinyu/tensorrtx/tree/yolov5-v6.2/yolov5)
 - For yolov5 v6.0, download .pt from [yolov5 release v6.0](https://github.com/ultralytics/yolov5/releases/tag/v6.0), `git clone -b v6.0 https://github.com/ultralytics/yolov5.git` and `git clone -b yolov5-v6.0 https://github.com/wang-xinyu/tensorrtx.git`, then follow how-to-run in [tensorrtx/yolov5-v6.0](https://github.com/wang-xinyu/tensorrtx/tree/yolov5-v6.0/yolov5).
 - For yolov5 v5.0, download .pt from [yolov5 release v5.0](https://github.com/ultralytics/yolov5/releases/tag/v5.0), `git clone -b v5.0 https://github.com/ultralytics/yolov5.git` and `git clone -b yolov5-v5.0 https://github.com/wang-xinyu/tensorrtx.git`, then follow how-to-run in [tensorrtx/yolov5-v5.0](https://github.com/wang-xinyu/tensorrtx/tree/yolov5-v5.0/yolov5).
@@ -63,7 +64,7 @@ Currently, we support yolov5 v1.0, v2.0, v3.0, v3.1, v4.0, v5.0, v6.0, v6.2
 
 ```
 // clone code according to above #Different versions of yolov5
-// download https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5s.pt
+// download https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt
 cp {tensorrtx}/yolov5/gen_wts.py {ultralytics}/yolov5
 cd {ultralytics}/yolov5
 python gen_wts.py -w yolov5s.pt -o yolov5s.wts
@@ -103,6 +104,10 @@ python yolov5_trt.py
 python yolov5_trt_cuda_python.py
 ```
 
+<p align="center">
+<img src="https://user-images.githubusercontent.com/15235574/78247927-4d9fac00-751e-11ea-8b1b-704a0aeb3fcf.jpg" height="360px;">
+</p>
+
 ### Classification
 
 ```
@@ -116,6 +121,20 @@ wget https://github.com/joannzhang00/ImageNet-dataset-classes-labels/blob/main/i
 ./yolov5-cls -d yolov5s-cls.engine ../samples
 ```
 
+### Instance Segmentation
+
+```
+# Build and serialize TensorRT engine
+./yolov5-seg -s yolov5s-seg.wts yolov5s-seg.engine s
+
+# Run inference
+./yolov5-seg -d yolov5s-seg.engine ../samples
+```
+
+<p align="center">
+<img src="https://user-images.githubusercontent.com/15235574/208305921-0a2ee358-6550-4d36-bb86-867685bfe069.jpg" height="360px;">
+</p>
+
 # INT8 Quantization
 
 1. Prepare calibration images, you can randomly select 1000s images from your train set. For coco, you can also download my calibration images `coco_calib` from [GoogleDrive](https://drive.google.com/drive/folders/1s7jE9DtOngZMzJC1uL307J2MiaGwdRSI?usp=sharing) or [BaiduPan](https://pan.baidu.com/s/1GOm_-JobpyLMAqZWCDUhKg) pwd: a9wh
@@ -126,9 +145,6 @@ wget https://github.com/joannzhang00/ImageNet-dataset-classes-labels/blob/main/i
 
 4. serialize the model and test
 
-<p align="center">
-<img src="https://user-images.githubusercontent.com/15235574/78247927-4d9fac00-751e-11ea-8b1b-704a0aeb3fcf.jpg" height="360px;">
-</p>
 
 ## More Information
 
diff --git a/yolov5/common.hpp b/yolov5/common.hpp
@@ -162,6 +162,7 @@ ILayer* convBlock(INetworkDefinition *network, std::map<std::string, Weights>& w
     conv1->setStrideNd(DimsHW{ s, s });
     conv1->setPaddingNd(DimsHW{ p, p });
     conv1->setNbGroups(g);
+    conv1->setName((lname + ".conv").c_str());
     IScaleLayer* bn1 = addBatchNorm2d(network, weightMap, *conv1->getOutput(0), lname + ".bn", 1e-3);
 
     // silu = x * sigmoid
@@ -273,6 +274,21 @@ ILayer* SPPF(INetworkDefinition *network, std::map<std::string, Weights>& weight
     return cv2;
 }
 
+ILayer* Proto(INetworkDefinition* network, std::map<std::string, Weights>& weightMap, ITensor& input, int c_, int c2, std::string lname) {
+    auto cv1 = convBlock(network, weightMap, input, c_, 3, 1, 1, lname + ".cv1");
+
+    auto upsample = network->addResize(*cv1->getOutput(0));
+    assert(upsample);
+    upsample->setResizeMode(ResizeMode::kNEAREST);
+    const float scales[] = {1, 2, 2};
+    upsample->setScales(scales, 3);
+
+    auto cv2 = convBlock(network, weightMap, *upsample->getOutput(0), c_, 3, 1, 1, lname + ".cv2");
+    auto cv3 = convBlock(network, weightMap, *cv2->getOutput(0), c2, 1, 1, 1, lname + ".cv3");
+    assert(cv3);
+    return cv3;
+}
+
 std::vector<std::vector<float>> getAnchors(std::map<std::string, Weights>& weightMap, std::string lname) {
     std::vector<std::vector<float>> anchors;
     Weights wts = weightMap[lname + ".anchor_grid"];
@@ -285,13 +301,13 @@ std::vector<std::vector<float>> getAnchors(std::map<std::string, Weights>& weigh
     return anchors;
 }
 
-IPluginV2Layer* addYoLoLayer(INetworkDefinition *network, std::map<std::string, Weights>& weightMap, std::string lname, std::vector<IConvolutionLayer*> dets) {
+IPluginV2Layer* addYoLoLayer(INetworkDefinition *network, std::map<std::string, Weights>& weightMap, std::string lname, std::vector<IConvolutionLayer*> dets, bool is_segmentation = false) {
     auto creator = getPluginRegistry()->getPluginCreator("YoloLayer_TRT", "1");
     auto anchors = getAnchors(weightMap, lname);
     PluginField plugin_fields[2];
-    int netinfo[4] = {Yolo::CLASS_NUM, Yolo::INPUT_W, Yolo::INPUT_H, Yolo::MAX_OUTPUT_BBOX_COUNT};
+    int netinfo[5] = {Yolo::CLASS_NUM, Yolo::INPUT_W, Yolo::INPUT_H, Yolo::MAX_OUTPUT_BBOX_COUNT, (int)is_segmentation};
     plugin_fields[0].data = netinfo;
-    plugin_fields[0].length = 4;
+    plugin_fields[0].length = 5;
     plugin_fields[0].name = "netinfo";
     plugin_fields[0].type = PluginFieldType::kFLOAT32;
 
diff --git a/yolov5/gen_wts.py b/yolov5/gen_wts.py
@@ -13,7 +13,7 @@ def parse_args():
     parser.add_argument(
         '-o', '--output', help='Output (.wts) file path (optional)')
     parser.add_argument(
-        '-t', '--type', type=str, default='detect', choices=['detect', 'cls'],
+        '-t', '--type', type=str, default='detect', choices=['detect', 'cls', 'seg'],
         help='determines the model is detection/classification')
     args = parser.parse_args()
     if not os.path.isfile(args.weights):
@@ -37,7 +37,7 @@ def parse_args():
 model = torch.load(pt_file, map_location=device)  # load to FP32
 model = model['ema' if model.get('ema') else 'model'].float()
 
-if m_type == "detect":
+if m_type in ['detect', 'seg']:
     # update anchor_grid info
     anchor_grid = model.model[-1].anchors * model.model[-1].stride[..., None, None]
     # model.model[-1].anchor_grid = anchor_grid
diff --git a/yolov5/yololayer.cu b/yolov5/yololayer.cu
@@ -25,12 +25,13 @@ using namespace Yolo;
 
 namespace nvinfer1
 {
-    YoloLayerPlugin::YoloLayerPlugin(int classCount, int netWidth, int netHeight, int maxOut, const std::vector<Yolo::YoloKernel>& vYoloKernel)
+    YoloLayerPlugin::YoloLayerPlugin(int classCount, int netWidth, int netHeight, int maxOut, bool is_segmentation, const std::vector<Yolo::YoloKernel>& vYoloKernel)
     {
         mClassCount = classCount;
         mYoloV5NetWidth = netWidth;
         mYoloV5NetHeight = netHeight;
         mMaxOutObject = maxOut;
+        is_segmentation_ = is_segmentation;
         mYoloKernel = vYoloKernel;
         mKernelCount = vYoloKernel.size();
 
@@ -63,6 +64,7 @@ namespace nvinfer1
         read(d, mYoloV5NetWidth);
         read(d, mYoloV5NetHeight);
         read(d, mMaxOutObject);
+        read(d, is_segmentation_);
         mYoloKernel.resize(mKernelCount);
         auto kernelSize = mKernelCount * sizeof(YoloKernel);
         memcpy(mYoloKernel.data(), d, kernelSize);
@@ -88,6 +90,7 @@ namespace nvinfer1
         write(d, mYoloV5NetWidth);
         write(d, mYoloV5NetHeight);
         write(d, mMaxOutObject);
+        write(d, is_segmentation_);
         auto kernelSize = mKernelCount * sizeof(YoloKernel);
         memcpy(d, mYoloKernel.data(), kernelSize);
         d += kernelSize;
@@ -97,7 +100,7 @@ namespace nvinfer1
 
     size_t YoloLayerPlugin::getSerializationSize() const TRT_NOEXCEPT
     {
-        return sizeof(mClassCount) + sizeof(mThreadCount) + sizeof(mKernelCount) + sizeof(Yolo::YoloKernel) * mYoloKernel.size() + sizeof(mYoloV5NetWidth) + sizeof(mYoloV5NetHeight) + sizeof(mMaxOutObject);
+        return sizeof(mClassCount) + sizeof(mThreadCount) + sizeof(mKernelCount) + sizeof(Yolo::YoloKernel) * mYoloKernel.size() + sizeof(mYoloV5NetWidth) + sizeof(mYoloV5NetHeight) + sizeof(mMaxOutObject) + sizeof(is_segmentation_);
     }
 
     int YoloLayerPlugin::initialize() TRT_NOEXCEPT
@@ -172,15 +175,15 @@ namespace nvinfer1
     // Clone the plugin
     IPluginV2IOExt* YoloLayerPlugin::clone() const TRT_NOEXCEPT
     {
-        YoloLayerPlugin* p = new YoloLayerPlugin(mClassCount, mYoloV5NetWidth, mYoloV5NetHeight, mMaxOutObject, mYoloKernel);
+        YoloLayerPlugin* p = new YoloLayerPlugin(mClassCount, mYoloV5NetWidth, mYoloV5NetHeight, mMaxOutObject, is_segmentation_, mYoloKernel);
         p->setPluginNamespace(mPluginNamespace);
         return p;
     }
 
     __device__ float Logist(float data) { return 1.0f / (1.0f + expf(-data)); };
 
     __global__ void CalDetection(const float *input, float *output, int noElements,
-        const int netwidth, const int netheight, int maxoutobject, int yoloWidth, int yoloHeight, const float anchors[CHECK_COUNT * 2], int classes, int outputElem)
+        const int netwidth, const int netheight, int maxoutobject, int yoloWidth, int yoloHeight, const float anchors[CHECK_COUNT * 2], int classes, int outputElem, bool is_segmentation)
     {
 
         int idx = threadIdx.x + blockDim.x * blockIdx.x;
@@ -190,14 +193,15 @@ namespace nvinfer1
         int bnIdx = idx / total_grid;
         idx = idx - total_grid * bnIdx;
         int info_len_i = 5 + classes;
+        if (is_segmentation) info_len_i += 32;
         const float* curInput = input + bnIdx * (info_len_i * total_grid * CHECK_COUNT);
 
         for (int k = 0; k < CHECK_COUNT; ++k) {
             float box_prob = Logist(curInput[idx + k * info_len_i * total_grid + 4 * total_grid]);
             if (box_prob < IGNORE_THRESH) continue;
             int class_id = 0;
             float max_cls_prob = 0.0;
-            for (int i = 5; i < info_len_i; ++i) {
+            for (int i = 5; i < 5 + classes; ++i) {
                 float p = Logist(curInput[idx + k * info_len_i * total_grid + i * total_grid]);
                 if (p > max_cls_prob) {
                     max_cls_prob = p;
@@ -230,6 +234,10 @@ namespace nvinfer1
             det->bbox[3] = det->bbox[3] * det->bbox[3] * anchors[2 * k + 1];
             det->conf = box_prob * max_cls_prob;
             det->class_id = class_id;
+
+            for (int i = 0; is_segmentation && i < 32; i++) {
+                det->mask[i] = curInput[idx + k * info_len_i * total_grid + (i + 5 + classes) * total_grid];
+            }
         }
     }
 
@@ -247,7 +255,7 @@ namespace nvinfer1
 
             //printf("Net: %d  %d \n", mYoloV5NetWidth, mYoloV5NetHeight);
             CalDetection << < (numElem + mThreadCount - 1) / mThreadCount, mThreadCount, 0, stream >> >
-                (inputs[i], output, numElem, mYoloV5NetWidth, mYoloV5NetHeight, mMaxOutObject, yolo.width, yolo.height, (float*)mAnchor[i], mClassCount, outputElem);
+                (inputs[i], output, numElem, mYoloV5NetWidth, mYoloV5NetHeight, mMaxOutObject, yolo.width, yolo.height, (float*)mAnchor[i], mClassCount, outputElem, is_segmentation_);
         }
     }
 
@@ -294,9 +302,10 @@ namespace nvinfer1
         int input_w = p_netinfo[1];
         int input_h = p_netinfo[2];
         int max_output_object_count = p_netinfo[3];
+        bool is_segmentation = (bool)p_netinfo[4];
         std::vector<Yolo::YoloKernel> kernels(fc->fields[1].length);
         memcpy(&kernels[0], fc->fields[1].data, kernels.size() * sizeof(Yolo::YoloKernel));
-        YoloLayerPlugin* obj = new YoloLayerPlugin(class_count, input_w, input_h, max_output_object_count, kernels);
+        YoloLayerPlugin* obj = new YoloLayerPlugin(class_count, input_w, input_h, max_output_object_count, is_segmentation, kernels);
         obj->setPluginNamespace(mNamespace.c_str());
         return obj;
     }
diff --git a/yolov5/yololayer.h b/yolov5/yololayer.h
@@ -27,6 +27,7 @@ namespace Yolo
         float bbox[LOCATIONS];
         float conf;  // bbox_conf * cls_conf
         float class_id;
+        float mask[32];
     };
 }
 
@@ -35,7 +36,7 @@ namespace nvinfer1
     class API YoloLayerPlugin : public IPluginV2IOExt
     {
     public:
-        YoloLayerPlugin(int classCount, int netWidth, int netHeight, int maxOut, const std::vector<Yolo::YoloKernel>& vYoloKernel);
+        YoloLayerPlugin(int classCount, int netWidth, int netHeight, int maxOut, bool is_segmentation, const std::vector<Yolo::YoloKernel>& vYoloKernel);
         YoloLayerPlugin(const void* data, size_t length);
         ~YoloLayerPlugin();
 
@@ -96,6 +97,7 @@ namespace nvinfer1
         int mYoloV5NetWidth;
         int mYoloV5NetHeight;
         int mMaxOutObject;
+        bool is_segmentation_;
         std::vector<Yolo::YoloKernel> mYoloKernel;
         void** mAnchor;
     };
diff --git a/yolov5/yolov5_seg.cpp b/yolov5/yolov5_seg.cpp