add support for not loading weights #1424

parmeet · 2021-10-26T14:17:05Z

Follow-up on #1406

We would like to add support for not loading weights from pre-trained models, such that user can still instantiate standard model architectures but without initializing them to support training from scratch.

Example usage:

import torchtext
xlmr_base = torchtext.models.XLMR_BASE_ENCODER
model_uninitialized = xlmr_base.get_model(load_weights=False)

mthrok

I have some questions but overall the change itself looks good.

mthrok · 2021-10-26T15:51:09Z

torchtext/models/roberta/bundler.py

@@ -56,7 +56,10 @@ class RobertaModelBundle:
    _head: Optional[Module] = None
    transform: Optional[Callable] = None

-    def get_model(self, head: Optional[Module] = None, *, dl_kwargs=None) -> RobertaModel:
+    def get_model(self, load_weights=True, head: Optional[Module] = None, *, dl_kwargs=None) -> RobertaModel:


Few comments.

Adding an argument in front of existing one can be a BC-breaking, if this API is already a part of the previous release.

Among head and load_weights, which one do you think is use more often? I think the one used more frequently should come first.

Also this makes me wonder what the combination of load_weights and custom head should behave.
Is the provided custom head expected to be trained or untrained?

If load_weights=False, this does not matter, but if load_weights=True, there are two cases, where the provided custom head comes with pre-trained weights and not.

Now, looking at the logic where the state dict is loaded, (model.load_state_dict(state_dict, strict=False)), isn't this code overwriting the weight for the given custom head, if the key matches? Of course, if that's the spec and it is documented somewhere that's okay, (and this is out-of-the-scope of this PR) but I did not realize this when I reviewed the original PR for get_model logic. What do you think?

Few comments.

Adding an argument in front of existing one can be a BC-breaking, if this API is already a part of the previous release.

They are not released yet, so we are OK breaking it if necessary.

Among head and load_weights, which one do you think is use more often? I think the one used more frequently should come first.

It's a good point. I cannot say for sure, but my guess would be users would want to put their custom heads more often than making load_weights=False (as they would like to use pre-trained encoder weights). So I will change the order.

Also this makes me wonder what the combination of load_weights and custom head should behave. Is the provided custom head expected to be trained or untrained?

If load_weights=False, this does not matter, but if load_weights=True, there are two cases, where the provided custom head comes with pre-trained weights and not.

Now, looking at the logic where the state dict is loaded, (model.load_state_dict(state_dict, strict=False)), isn't this code overwriting the weight for the given custom head, if the key matches? Of course, if that's the spec and it is documented somewhere that's okay, (and this is out-of-the-scope of this PR) but I did not realize this when I reviewed the original PR for get_model logic. What do you think?

Thanks for surfacing this. I have yet to figure out the final behavior and document it properly. One idea would be to make sure when the user provide custom head, we only load pre-trained weights for encoder, leaving the custom head in the same state as provided by user. In which case, we do not have to worry about over-writing weights if the key matches. WDYT?

Yeah, I think it makes more sense to leave the custom head provided by user untouched.

mthrok · 2021-10-26T15:52:43Z

torchtext/models/roberta/bundler.py

+    def get_model(self, load_weights=True, head: Optional[Module] = None, *, dl_kwargs=None) -> RobertaModel:
+
+        if load_weights:
+            assert self._path is not None, "load_weights cannot be True when _path is not set"


self._path is abstracted away from the regular users, so I think rephrasing it without refereeing to an internal attribute would be better. Otherwise, I would wonder, "Did I do something wrong about _path?"

make sense!

parmeet added 2 commits October 26, 2021 10:10

add support for not loading weights

f24a010

typo fix

1f3e355

pytorch-probot bot added the ciflow/default label Oct 26, 2021

facebook-github-bot added the cla signed label Oct 26, 2021

parmeet requested review from mthrok and hudeven October 26, 2021 14:18

mthrok approved these changes Oct 26, 2021

View reviewed changes

fix argument order

b318413

parmeet mentioned this pull request Oct 26, 2021

Add XLMR Base and Large pre-trained models and corresponding transformations #1406

Merged

9 tasks

parmeet merged commit 1c3bce2 into pytorch:main Oct 26, 2021

parmeet deleted the model_unitialized branch October 27, 2021 14:14

parmeet mentioned this pull request Nov 13, 2021

Add a class method in Model Bundler to facilitate model creation with user-defined configuration and checkpoint #1442

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add support for not loading weights #1424

add support for not loading weights #1424

Uh oh!

parmeet commented Oct 26, 2021 •

edited

Loading

Uh oh!

mthrok left a comment

Uh oh!

mthrok Oct 26, 2021 •

edited

Loading

Uh oh!

parmeet Oct 26, 2021 •

edited

Loading

Uh oh!

mthrok Oct 27, 2021

Uh oh!

mthrok Oct 26, 2021

Uh oh!

parmeet Oct 26, 2021

Uh oh!

Uh oh!

add support for not loading weights #1424

add support for not loading weights #1424

Uh oh!

Conversation

parmeet commented Oct 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mthrok left a comment

Choose a reason for hiding this comment

Uh oh!

mthrok Oct 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

parmeet Oct 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mthrok Oct 27, 2021

Choose a reason for hiding this comment

Uh oh!

mthrok Oct 26, 2021

Choose a reason for hiding this comment

Uh oh!

parmeet Oct 26, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

parmeet commented Oct 26, 2021 •

edited

Loading

mthrok Oct 26, 2021 •

edited

Loading

parmeet Oct 26, 2021 •

edited

Loading