Implementation of model average. #1

yaozengwei · 2022-05-01T15:23:55Z

This is the implementation of Dan's idea about model average. (see k2-fsa#337)

yaozengwei · 2022-05-02T05:19:19Z

The codes are based on egs/librispeech/pruned_transducer_stateless2.
During training, the averaged model model_avg is updated each average_period batches with:
model_avg = (average_period / batch_idx_train) * model + ((batch_idx_train - average_period) / batch_idx_train) * model_avg
During decoding, Let start = batch_idx_train of model-start; end = batch_idx_train of model-end. Then the averaged model avg over epoch [start+1, start+2, ..., end] is avg = (model_end * end - model_start * start) / (end - start).
When trained on train-clean-100 with 3 gpu for 30 epochs, average_period=100, I got following results with greedy search decoding:

decode with epoch-29, avg=5, 7.14 & 19.33 (without averaged model) -> 7.03 & 18.85 (with averaged model);
decode with epoch-29, avg=10, 6.99 & 18.93 (without averaged model) -> 6.91 & 18.65 (with averaged model).

When trained on full librispeech with 6 gpu for 30 epochs, average_period=100, I got following results with greedy search decoding:

decode with epoch-29, avg=5, 2.77 & 6.77 (without averaged model) -> 2.72 & 6.67 (with averaged model);
decode with epoch-29, avg=10, 2.78 & 6.68 (without averaged model) -> 2.74 & 6.67 (with averaged model).

csukuangfj · 2022-05-02T06:26:32Z

egs/librispeech/ASR/pruned_transducer_stateless3/decode.py

+"""
+Usage:
+(1) greedy search
+./pruned_transducer_stateless2/decode.py \


Suggested change

./pruned_transducer_stateless2/decode.py \

./pruned_transducer_stateless3/decode.py \

Also, please sync with the latest k2/icefall and rename it to pruned_transducer_stateless4

csukuangfj · 2022-05-02T06:30:30Z

egs/librispeech/ASR/pruned_transducer_stateless3/decode.py

+            model.load_state_dict(average_checkpoints(filenames, device=device))
+    else:
+        assert params.iter == 0
+        start = params.epoch - params.avg


Please add more doc to --use-average-model.
It is not clear how it is used in the code from the current help info.

csukuangfj · 2022-05-02T06:33:04Z

egs/librispeech/ASR/pruned_transducer_stateless3/decode.py

+        filename_start = f"{params.exp_dir}/epoch-{start}.pt"
+        filename_end = f"{params.exp_dir}/epoch-{params.epoch}.pt"
+        logging.info(
+            f"averaging modes over range with {filename_start} (excluded) "


Suggested change

f"averaging modes over range with {filename_start} (excluded) "

f"averaging models over range with {filename_start} (excluded) "

csukuangfj · 2022-05-02T06:36:21Z

icefall/checkpoint.py

    checkpoint.pop("model")

+    if model_avg is not None and "model_avg" in checkpoint:
+        model_avg.load_state_dict(checkpoint["model_avg"], strict=strict)


Please add a log here, e.g., saying "loading averaged model".

csukuangfj · 2022-05-02T06:44:18Z

icefall/checkpoint.py

+    # Identify shared parameters. Two parameters are said to be shared
+    # if they have the same data_ptr
+    uniqued: Dict[int, str] = dict()
+    for k, v in avg.items():
+        v_data_ptr = v.data_ptr()
+        if v_data_ptr in uniqued:
+            continue
+        uniqued[v_data_ptr] = k
+
+    uniqued_names = list(uniqued.values())
+    for k in uniqued_names:
+        avg[k] *= weight_end
+        avg[k] += model_start[k] * weight_start
+


This part is almost the same as the above function. Please refactor it to reduce redundant code.

csukuangfj · 2022-05-02T06:45:41Z

egs/librispeech/ASR/pruned_transducer_stateless3/train.py

+    parser.add_argument(
+        "--start-epoch",
+        type=int,
+        default=0,


Please change it so that epoch is counted from 1, not 0.

csukuangfj · 2022-05-02T06:46:31Z

egs/librispeech/ASR/pruned_transducer_stateless3/train.py

+def load_checkpoint_if_available(
+    params: AttributeDict,
+    model: nn.Module,
+    model_avg: nn.Module = None,


Suggested change

model_avg: nn.Module = None,

model_avg: Optional[nn.Module] = None,

csukuangfj · 2022-05-02T06:47:05Z

egs/librispeech/ASR/pruned_transducer_stateless3/train.py

+        The return value of :func:`get_params`.
+      model:
+        The training model.
+      optimizer:


Please update the doc to include model_avg.

csukuangfj · 2022-05-02T06:49:08Z

egs/librispeech/ASR/pruned_transducer_stateless3/train.py

+    logging.info(f"Number of model parameters: {num_param}")
+
+    assert params.save_every_n >= params.average_period
+    model_avg: nn.Module = None


Suggested change

model_avg: nn.Module = None

model_avg: Optional[nn.Module] = None

…n the pruned_transducer_stateless4/decode.py

…_model

yaozengwei added 4 commits May 1, 2022 23:20

First upload of model average codes.

fba9ae0

minor fix

08b37e0

update decode file

aea8a03

update .flake8

36c241e

csukuangfj reviewed May 2, 2022

View reviewed changes

yaozengwei added 8 commits May 4, 2022 21:37

rename pruned_transducer_stateless3 to pruned_transducer_stateless4

44d75e2

Merge remote-tracking branch 'k2-fsa/master' into model_avg_new

389e899

Merge branch 'model_avg_new' into model_avg

8eb380d

change epoch number counter starting from 1 instead of 0

ff3c0d5

minor fix of pruned_transducer_stateless4/train.py

a0592e0

refactor the checkpoint.py

8bf2fef

minor fix, update docs, and modify the epoch number to count from 1 i…

22ecc56

…n the pruned_transducer_stateless4/decode.py

update author info

5c07402

yaozengwei mentioned this pull request May 5, 2022

Model average k2-fsa/icefall#344

Merged

add docs of the scaling in function average_checkpoints_with_averaged…

4f18d52

…_model

	./pruned_transducer_stateless2/decode.py \
	./pruned_transducer_stateless3/decode.py \

	f"averaging modes over range with {filename_start} (excluded) "
	f"averaging models over range with {filename_start} (excluded) "

	model_avg: nn.Module = None,
	model_avg: Optional[nn.Module] = None,

	model_avg: nn.Module = None
	model_avg: Optional[nn.Module] = None

Implementation of model average. #1

Are you sure you want to change the base?

Implementation of model average. #1

Uh oh!

Conversation

yaozengwei commented May 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaozengwei commented May 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csukuangfj May 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yaozengwei commented May 1, 2022 •

edited

Loading

yaozengwei commented May 2, 2022 •

edited

Loading

csukuangfj May 2, 2022 •

edited

Loading