-
Notifications
You must be signed in to change notification settings - Fork 29.6k
Update Model Card for Encoder Decoder Model #39272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Update Model Card for Encoder Decoder Model #39272
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this one!
@@ -14,8 +14,6 @@ rendered properly in your Markdown viewer. | |||
--> | |||
|
|||
# Encoder Decoder Models | |||
|
|||
<div class="flex flex-wrap space-x-1"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap the badges with the below to align them to the right
<div style="float: right;">
...
</div>
The effectiveness of initializing sequence-to-sequence models with pretrained checkpoints for sequence generation tasks | ||
was shown in [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://huggingface.co/papers/1907.12461) by | ||
Sascha Rothe, Shashi Narayan, Aliaksei Severyn. | ||
[`EncoderDecoderModel`](https://huggingface.co/papers/1706.03762) has two main parts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[`EncoderDecoderModel`](https://huggingface.co/papers/1706.03762) has two main parts: | |
[`EncoderDecoderModel`](https://huggingface.co/papers/1706.03762) initializes a sequence-to-sequence model with any pretrained autoencoder and pretrained autoregressive model. It is effective for sequence generation tasks as demonstrated in [Text Summarization with Pretrained Encoders](https://huggingface.co/papers/1908.08345) which uses [`BertModel`] as the encoder and decoder. |
Encoder: This part reads the input text and converts it into a set of numerical features capturing the meaning and context | ||
of the input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encoder: This part reads the input text and converts it into a set of numerical features capturing the meaning and context | |
of the input. |
|
||
An application of this architecture could be to leverage two pretrained [`BertModel`] as the encoder | ||
and decoder for a summarization model as was shown in: [Text Summarization with Pretrained Encoders](https://huggingface.co/papers/1908.08345) by Yang Liu and Mirella Lapata. | ||
Decoder: This part takes the numerical features from the encoder and generates the output text step by step. It uses the information from the encoder to produce meaningful and relevant output, such as a translation, a summary, or an answer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decoder: This part takes the numerical features from the encoder and generates the output text step by step. It uses the information from the encoder to produce meaningful and relevant output, such as a translation, a summary, or an answer. |
We can be use this model class to initialize a sequence-to-sequence model with any pretrained autoencoding model as the | ||
encoder and any pretrained autoregressive model as the decoder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can be use this model class to initialize a sequence-to-sequence model with any pretrained autoencoding model as the | |
encoder and any pretrained autoregressive model as the decoder. |
- [`EncoderDecoderModel`] can be randomly initialized from an encoder and a decoder config. In the following example, we show | ||
how to do this using the default [`BertModel`] configuration for the encoder and the default [`BertForCausalLM`] configuration for the decoder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [`EncoderDecoderModel`] can be randomly initialized from an encoder and a decoder config. In the following example, we show | |
how to do this using the default [`BertModel`] configuration for the encoder and the default [`BertForCausalLM`] configuration for the decoder. | |
- [`EncoderDecoderModel`] can be randomly initialized from an encoder and a decoder config as shown below. |
``` | ||
|
||
## Training | ||
## Notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a note here about initializing from pretrained encoder/decoder
>>> config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder) | ||
>>> model = EncoderDecoderModel(config=config) | ||
```python | ||
import torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to adapt the Pipeline
, AutoModel
and transformers-cli
examples for summarization since thats what the checkpoint was fine-tuned for
|
||
- The Encoder Decoder Model can also be used for translation of different languages. The example below demonstrates a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Helsinki-NLP/opus-mt-en-de
is already an encoder-decoder model versus combining a separate encoder and decoder model together. You'll either need to find an existing finetuned EncoderDecoderModel checkpoint for translation or initialize a pretrained encoder/decoder
> This model was contributed by [Molbap](https://huggingface.co/Molbap), with tremendous help from | ||
> [Anton Vlasjuk](https://github.com/vasqu). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> This model was contributed by [Molbap](https://huggingface.co/Molbap), with tremendous help from | |
> [Anton Vlasjuk](https://github.com/vasqu). | |
> This model was contributed by [Molbap](https://huggingface.co/Molbap) and [Anton Vlasjuk](https://huggingface.co/AntonV). |
What does this PR do?
As described in the issue, this PR updates the model card for the encoder decoder model with an additional translation example. I have also re-added the contributor names for the Mamba and Mamba-2 models which were previously removed by me. Please let me know if any modifications are required and I will make the necessary changes.
Fixes #8944
Refs #36979
Before submitting
Who can review?
@stevhliu