Merge pull request #146 from zhanghuiyao/r0.1

CaitinZhao · web-flow · commit 652b0d1b79db · 2023-06-15T19:02:36.000+08:00
update README
diff --git a/README.md b/README.md
@@ -12,19 +12,24 @@
     </a>
 </p>
 
-MindYOLO is [MindSpore Lab](https://github.com/mindspore-lab)'s software system that implements state-of-the-art YOLO series algorithms, [support list and benchmark](MODEL_ZOO.md). It is written in Python and powered by the [MindSpore](https://mindspore.cn/) deep learning framework.
+MindYOLO is [MindSpore Lab](https://github.com/mindspore-lab)'s software toolbox that implements state-of-the-art YOLO series algorithms, [support list and benchmark](MODEL_ZOO.md). It is written in Python and powered by the [MindSpore](https://mindspore.cn/) AI framework.
 
-The master branch works with **MindSpore 1.8.1**.
+The r0.1 branch supports **MindSpore 1.8.1**.
 
 <img src=".github/000000137950.jpg" />
 
 
 ## What is New 
-- 2023/03/30
-1. Currently, the models supported by the first release include the basic specifications of YOLOv3/YOLOv5/YOLOv7;
-2. Models can be exported to MindIR/AIR format for deployment.
-3. ⚠️ The current version is based on the static shape of GRAPH. The dynamic shape of the PYNATIVE will be added later. Please look forward to it.
-4. ⚠️ The current version only supports the Ascend platform, and the GPU platform will support it later.
+
+- 2023/06/15
+
+1. New version: v0.1 is released!
+
+2. Support YOLOv3/v4/v5/v7/v8/X 6 models and release 23 weights, see [MODEL ZOO](MODEL_ZOO.md) for details.
+
+3. Models can be exported to MindIR/AIR format for deployment.
+
+4. New online documents are available!
 
 ## Benchmark and Model Zoo
 
@@ -35,11 +40,10 @@ See [MODEL ZOO](MODEL_ZOO.md).
 
 - [x] [YOLOv8](configs/yolov8)
 - [x] [YOLOv7](configs/yolov7)
+- [x] [YOLOX](configs/yolox)
 - [x] [YOLOv5](configs/yolov5)
-- [x] [YOLOv3](configs/yolov3)
 - [x] [YOLOv4](configs/yolov4)
-- [x] [YOLOX](configs/yolox)
-- [ ] [YOLOv6](configs/yolov6)
+- [x] [YOLOv3](configs/yolov3)
 
 </details>
 
@@ -61,6 +65,8 @@ MindSpore can be easily installed by following the official [instructions](https
 
 The following instructions assume the desired dependency is fulfilled.
 
+⚠️ The current version only supports the Ascend platform, and the GPU platform will be supported later.
+
 ## Getting Started
 
 See [GETTING STARTED](GETTING_STARTED.md)
@@ -70,6 +76,9 @@ See [GETTING STARTED](GETTING_STARTED.md)
 To be supplemented.
 
 ## Notes
+
+⚠️ The current version is based on the static shape of GRAPH. The dynamic shape of the PYNATIVE will be added later. Please look forward to it.
+
 ### How to Contribute
 
 We appreciate all contributions including issues and PRs to make MindYOLO better. 
diff --git a/README_CN.md b/README_CN.md
@@ -14,19 +14,18 @@
 
 MindYOLO是[MindSpore Lab](https://github.com/mindspore-lab)开发的AI套件，实现了最先进的YOLO系列算法，[查看支持的模型算法](MODEL_ZOO.md)。
 
-MindYOLO使用Python语言编写，基于[MindSpore](https://mindspore.cn/)深度学习框架开发，适用于**MindSpore 1.8.1**。
+MindYOLO使用Python语言编写，基于[MindSpore](https://mindspore.cn/) AI框架开发，适用于**MindSpore 1.8.1**。
 
 
 <img src=".github/000000137950.jpg" />
 
 
 ## 新特性 
 
-- 2023/03/30
-1. 目前版本支持的模型包括YOLOv3/YOLOv5/YOLOv7的基本规格。
-2. 模型可以导出为MindIR/AIR格式进行部署。
-3. ⚠️ 当前版本基于GRAPH的静态Shape。后续将添加PYNATIVE的动态Shape支持，敬请期待。
-4. ⚠️ 当前版本仅支持Ascend平台，GPU平台将在后续版本中支持。
+- 2023/06/15
+1. 新版本：v0.1发布！
+2. 支持 YOLOv3/v4/v5/X/v7/v8 等6个模型，发布了23个模型weights，详情请参考 [MODEL ZOO](MODEL_ZOO.md)。
+3. 支持 MindIR/AIR 格式权重导出；
 
 
 ## 基准和模型仓库 
@@ -38,11 +37,10 @@ MindYOLO使用Python语言编写，基于[MindSpore](https://mindspore.cn/)深
 
 - [x] [YOLOv8](configs/yolov8)
 - [x] [YOLOv7](configs/yolov7)
+- [x] [YOLOX](configs/yolox)
 - [x] [YOLOv5](configs/yolov5)
 - [x] [YOLOv3](configs/yolov3)
 - [x] [YOLOv4](configs/yolov4)
-- [x] [YOLOX](configs/yolox)
-- [ ] [YOLOv6](configs/yolov6)
 
 
 </details>
@@ -61,7 +59,9 @@ MindYOLO使用Python语言编写，基于[MindSpore](https://mindspore.cn/)深
 pip install -r requirements.txt
 ```
 
-假定你已安装所需依赖，可以按照[官方说明](https://www.mindspore.cn/install)轻松安装MindSpore，你可以在其中选择最适合的硬件平台。要在分布式模式下运行，需要安装[openmpi](https://www.open-mpi.org/software/ompi/v4.0/)。 
+然后按照[官方说明](https://www.mindspore.cn/install)轻松安装MindSpore，你可以在其中选择最适合的硬件平台。要在分布式模式下运行，需要安装[openmpi](https://www.open-mpi.org/software/ompi/v4.0/)。
+
+⚠️ 当前版本仅支持Ascend平台，GPU平台将在后续版本中支持。
 
 ## 快速入门
 
@@ -72,6 +72,9 @@ pip install -r requirements.txt
 敬请期待
 
 ## 注意
+
+⚠️当前版本基于GRAPH的静态Shape。后续将添加PYNATIVE的动态Shape支持，敬请期待。
+
 ### 贡献方式
 
 我们感谢开发者用户的所有贡献，包括提issue和PR，一起让MindYOLO变得更好。
diff --git a/RELEASE.md b/RELEASE.md
@@ -0,0 +1,24 @@
+# Release Note
+
+
+## 0.1.0
+
+- 2023/06/15
+1. Add 3 new models with training recipes and pretrained weights for
+    - [YOLOv4](configs/yolov4)
+    - [YOLOv8](configs/yolov8)
+    - [YOLOX](configs/yolox)
+2. Support MindSpore 2.0.
+3. Support deployment on MindSpore lite 2.0.
+4. New online documents are available.
+
+
+## 0.0.1-alpha
+
+- 2022/03/30
+1. Add 6 new models with training recipes and pretrained weights for
+    - [YOLOv3](./configs/yolov3)
+    - [YOLOv5](./configs/yolov5)
+    - [YOLOv7](./configs/yolov7)
+2. Support file export as MindIR/AIR for deployment.
+3. Support train with EMA
diff --git a/mindyolo/utils/trainer_factory.py b/mindyolo/utils/trainer_factory.py
@@ -213,170 +213,6 @@ def train(
         self._on_train_end(run_context)
         logger.info("End Train.")
 
-    def train_with_datasink(
-        self,
-        epochs: int,
-        main_device: bool,
-        warmup_epoch: int = 0,
-        warmup_momentum: Union[list, None] = None,
-        keep_checkpoint_max: int = 10,
-        loss_item_name: list = [],
-        save_dir: str = "",
-        enable_modelarts: bool = False,
-        train_url: str = "",
-        run_eval: bool = False,
-        test_fn: types.FunctionType = None,
-        overflow_still_update: bool = False,
-        ms_jit: bool = True,
-        rank_size: int = 8,
-    ):
-        # Modify dataset columns name for data sink mode, because dataloader could not send string data to device.
-        def modify_dataset_columns(image, labels, img_files):
-            return image, labels
-
-        loader = self.dataloader.map(
-            modify_dataset_columns,
-            input_columns=["image", "labels", "img_files"],
-            output_columns=["image", "labels"],
-            column_order=["image", "labels"],
-        )
-
-        # to be compatible with old interface
-        has_eval_mask = list(isinstance(c, EvalWhileTrain) for c in self.callback)
-        if run_eval and not any(has_eval_mask):
-            self.callback.append(EvalWhileTrain())
-        if not run_eval and any(has_eval_mask):
-            ind = has_eval_mask.index(True)
-            self.callback.pop(ind)
-
-        # Change warmup_momentum, list of step -> list of epoch
-        warmup_momentum = (
-            [warmup_momentum[_i * self.steps_per_epoch] for _i in range(warmup_epoch)]
-            + [
-                warmup_momentum[-1],
-            ]
-            * (epochs - warmup_epoch)
-            if warmup_momentum
-            else None
-        )
-
-        # Build train epoch func with sink process
-        train_epoch_fn = ms.train.data_sink(
-            fn=self.train_step_fn,
-            dataset=loader,
-            sink_size=self.steps_per_epoch,
-            steps=epochs * self.steps_per_epoch,
-            jit=True,
-        )
-
-        # Attr
-        self.epochs = epochs
-        self.main_device = main_device
-        self.loss_item_name = loss_item_name
-
-        # Directories
-        ckpt_save_dir = os.path.join(save_dir, "weights")
-        sync_lock_dir = os.path.join(save_dir, "sync_locks") if not enable_modelarts else "/tmp/sync_locks"
-        if self.summary:
-            summary_dir = os.path.join(save_dir, "summary")
-            self.summary_record = SummaryRecord(summary_dir)
-        if main_device:
-            os.makedirs(ckpt_save_dir, exist_ok=True)  # save checkpoint path
-            os.makedirs(sync_lock_dir, exist_ok=False)  # sync_lock for run_eval
-
-        # Set Checkpoint Manager
-        manager = CheckpointManager(ckpt_save_policy="latest_k")
-        manager_ema = CheckpointManager(ckpt_save_policy="latest_k") if self.ema else None
-        manager_best = CheckpointManager(ckpt_save_policy="top_k") if run_eval else None
-        ckpt_filelist_best = []
-
-        run_context = RunContext(
-            epoch_num=epochs,
-            steps_per_epoch=self.steps_per_epoch,
-            total_steps=self.dataloader.dataset_size,
-            trainer=self,
-            test_fn=test_fn,
-            enable_modelarts=enable_modelarts,
-            sync_lock_dir=sync_lock_dir,
-            ckpt_save_dir=ckpt_save_dir,
-            train_url=train_url,
-            overflow_still_update=overflow_still_update,
-            ms_jit=ms_jit,
-            rank_size=rank_size,
-        )
-
-        s_epoch_time = time.time()
-        self._on_train_begin(run_context)
-        for epoch in range(epochs):
-            cur_epoch = epoch + 1
-            run_context.cur_epoch_index = cur_epoch
-            if epoch == 0:
-                logger.warning("In the data sink mode, log output will only occur once each epoch is completed.")
-                logger.warning(
-                    "The first epoch will be compiled for the graph, which may take a long time; "
-                    "You can come back later :)."
-                )
-
-            if warmup_momentum and isinstance(self.optimizer, (nn.SGD, nn.Momentum)):
-                dtype = self.optimizer.momentum.dtype
-                self.optimizer.momentum = Tensor(warmup_momentum[epoch], dtype)
-
-            # train one epoch with datasink
-            self._on_train_epoch_begin(run_context)
-            _, loss_item, _, _ = train_epoch_fn()
-            self._on_train_epoch_begin(run_context)
-
-            # print loss and lr
-            log_string = f"Epoch {cur_epoch}/{epochs}, Step {self.steps_per_epoch}/{self.steps_per_epoch}"
-            if len(self.loss_item_name) < len(loss_item):
-                self.loss_item_name += [f"loss_item{i}" for i in range(len(loss_item) - len(self.loss_item_name))]
-            for i in range(len(loss_item)):
-                log_string += f", {self.loss_item_name[i]}: {loss_item[i].asnumpy():.4f}"
-                if self.summary:
-                    self.summary_record.add_value("scalar", f"{self.loss_item_name[i]}", Tensor(loss_item[i].asnumpy()))
-            if self.optimizer.dynamic_lr:
-                if self.optimizer.is_group_lr:
-                    lr_cell = self.optimizer.learning_rate[0]
-                    cur_lr = lr_cell(Tensor(self.global_step, ms.int32)).asnumpy().item()
-                else:
-                    cur_lr = self.optimizer.learning_rate(Tensor(self.global_step, ms.int32)).asnumpy().item()
-            else:
-                cur_lr = self.optimizer.learning_rate.asnumpy().item()
-            log_string += f", cur_lr: {cur_lr}"
-            logger.info(log_string)
-
-            # save checkpoint per epoch on main device
-            if self.main_device:
-                # Save Checkpoint
-                ms.save_checkpoint(
-                    self.optimizer, os.path.join(ckpt_save_dir, f"optim_{self.model_name}.ckpt"), async_save=True
-                )
-                save_path = os.path.join(ckpt_save_dir, f"{self.model_name}-{cur_epoch}_{self.steps_per_epoch}.ckpt")
-                manager.save_ckpoint(self.network, num_ckpt=keep_checkpoint_max, save_path=save_path)
-                if self.ema:
-                    save_path_ema = os.path.join(
-                        ckpt_save_dir, f"EMA_{self.model_name}-{cur_epoch}_{self.steps_per_epoch}.ckpt"
-                    )
-                    manager_ema.save_ckpoint(self.ema.ema, num_ckpt=keep_checkpoint_max, save_path=save_path_ema)
-                logger.info(f"Saving model to {save_path}")
-
-                if enable_modelarts:
-                    sync_data(save_path, train_url + "/weights/" + save_path.split("/")[-1])
-                    if self.ema:
-                        sync_data(save_path_ema, train_url + "/weights/" + save_path_ema.split("/")[-1])
-
-                logger.info(f"Epoch {cur_epoch}/{epochs}, epoch time: {(time.time() - s_epoch_time) / 60:.2f} min.")
-                s_epoch_time = time.time()
-
-        if enable_modelarts and self.summary:
-            for p in os.listdir(summary_dir):
-                summary_file_path = os.path.join(summary_dir, p)
-                sync_data(summary_file_path, train_url + "/summary/" + summary_file_path.split("/")[-1])
-        if self.summary:
-            self.summary_record.close()
-        self._on_train_end(run_context)
-        logger.info("End Train.")
-
     def train_step(self, imgs, labels, cur_step=0, cur_epoch=0):
         if self.accumulate == 1:
             loss, loss_item, _, grads_finite = self.train_step_fn(imgs, labels, True)
diff --git a/train.py b/train.py
@@ -44,7 +44,6 @@ def get_parser_train(parents=None):
     parser.add_argument(
         "--ms_enable_graph_kernel", type=ast.literal_eval, default=False, help="use enable_graph_kernel or not"
     )
-    parser.add_argument("--ms_datasink", type=ast.literal_eval, default=False, help="Train with datasink.")
     parser.add_argument("--overflow_still_update", type=ast.literal_eval, default=True, help="overflow still update")
     parser.add_argument("--ema", type=ast.literal_eval, default=True, help="ema")
     parser.add_argument("--weight", type=str, default="", help="initial weight path")
@@ -264,41 +263,24 @@ def train(args):
         callback=callback_fns,
         reducer=reducer,
     )
-    if not args.ms_datasink:
-        trainer.train(
-            epochs=args.epochs,
-            main_device=main_device,
-            warmup_step=max(round(args.optimizer.warmup_epochs * steps_per_epoch), args.optimizer.min_warmup_step),
-            warmup_momentum=warmup_momentum,
-            accumulate=args.accumulate,
-            overflow_still_update=args.overflow_still_update,
-            keep_checkpoint_max=args.keep_checkpoint_max,
-            log_interval=args.log_interval,
-            loss_item_name=[] if not hasattr(loss_fn, "loss_item_name") else loss_fn.loss_item_name,
-            save_dir=args.save_dir,
-            enable_modelarts=args.enable_modelarts,
-            train_url=args.train_url,
-            run_eval=args.run_eval,
-            test_fn=test_fn,
-            rank_size=args.rank_size,
-            ms_jit=args.ms_jit
-        )
-    else:
-        logger.warning("DataSink is an experimental interface under development.")
-        logger.warning("Train with data sink mode.")
-        trainer.train_with_datasink(
-            epochs=args.epochs,
-            main_device=main_device,
-            warmup_epoch=max(args.optimizer.warmup_epochs, args.optimizer.min_warmup_step // steps_per_epoch),
-            warmup_momentum=warmup_momentum,
-            keep_checkpoint_max=args.keep_checkpoint_max,
-            loss_item_name=[] if not hasattr(loss_fn, "loss_item_name") else loss_fn.loss_item_name,
-            save_dir=args.save_dir,
-            enable_modelarts=args.enable_modelarts,
-            train_url=args.train_url,
-            run_eval=args.run_eval,
-            test_fn=test_fn,
-        )
+    trainer.train(
+        epochs=args.epochs,
+        main_device=main_device,
+        warmup_step=max(round(args.optimizer.warmup_epochs * steps_per_epoch), args.optimizer.min_warmup_step),
+        warmup_momentum=warmup_momentum,
+        accumulate=args.accumulate,
+        overflow_still_update=args.overflow_still_update,
+        keep_checkpoint_max=args.keep_checkpoint_max,
+        log_interval=args.log_interval,
+        loss_item_name=[] if not hasattr(loss_fn, "loss_item_name") else loss_fn.loss_item_name,
+        save_dir=args.save_dir,
+        enable_modelarts=args.enable_modelarts,
+        train_url=args.train_url,
+        run_eval=args.run_eval,
+        test_fn=test_fn,
+        rank_size=args.rank_size,
+        ms_jit=args.ms_jit
+    )
     logger.info("Training completed.")