Skip to content

Update Model Card for Encoder Decoder Model #39272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ParagEkbote
Copy link
Contributor

@ParagEkbote ParagEkbote commented Jul 8, 2025

What does this PR do?

As described in the issue, this PR updates the model card for the encoder decoder model with an additional translation example. I have also re-added the contributor names for the Mamba and Mamba-2 models which were previously removed by me. Please let me know if any modifications are required and I will make the necessary changes.

Fixes #8944
Refs #36979

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).

Who can review?

@stevhliu

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this one!

@@ -14,8 +14,6 @@ rendered properly in your Markdown viewer.
-->

# Encoder Decoder Models

<div class="flex flex-wrap space-x-1">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap the badges with the below to align them to the right

<div style="float: right;">
...
</div>

The effectiveness of initializing sequence-to-sequence models with pretrained checkpoints for sequence generation tasks
was shown in [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://huggingface.co/papers/1907.12461) by
Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
[`EncoderDecoderModel`](https://huggingface.co/papers/1706.03762) has two main parts:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[`EncoderDecoderModel`](https://huggingface.co/papers/1706.03762) has two main parts:
[`EncoderDecoderModel`](https://huggingface.co/papers/1706.03762) initializes a sequence-to-sequence model with any pretrained autoencoder and pretrained autoregressive model. It is effective for sequence generation tasks as demonstrated in [Text Summarization with Pretrained Encoders](https://huggingface.co/papers/1908.08345) which uses [`BertModel`] as the encoder and decoder.

Comment on lines +29 to +30
Encoder: This part reads the input text and converts it into a set of numerical features capturing the meaning and context
of the input.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Encoder: This part reads the input text and converts it into a set of numerical features capturing the meaning and context
of the input.


An application of this architecture could be to leverage two pretrained [`BertModel`] as the encoder
and decoder for a summarization model as was shown in: [Text Summarization with Pretrained Encoders](https://huggingface.co/papers/1908.08345) by Yang Liu and Mirella Lapata.
Decoder: This part takes the numerical features from the encoder and generates the output text step by step. It uses the information from the encoder to produce meaningful and relevant output, such as a translation, a summary, or an answer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Decoder: This part takes the numerical features from the encoder and generates the output text step by step. It uses the information from the encoder to produce meaningful and relevant output, such as a translation, a summary, or an answer.

Comment on lines +34 to +35
We can be use this model class to initialize a sequence-to-sequence model with any pretrained autoencoding model as the
encoder and any pretrained autoregressive model as the decoder.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We can be use this model class to initialize a sequence-to-sequence model with any pretrained autoencoding model as the
encoder and any pretrained autoregressive model as the decoder.

Comment on lines +132 to +133
- [`EncoderDecoderModel`] can be randomly initialized from an encoder and a decoder config. In the following example, we show
how to do this using the default [`BertModel`] configuration for the encoder and the default [`BertForCausalLM`] configuration for the decoder.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [`EncoderDecoderModel`] can be randomly initialized from an encoder and a decoder config. In the following example, we show
how to do this using the default [`BertModel`] configuration for the encoder and the default [`BertForCausalLM`] configuration for the decoder.
- [`EncoderDecoderModel`] can be randomly initialized from an encoder and a decoder config as shown below.

```

## Training
## Notes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note here about initializing from pretrained encoder/decoder

>>> config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)
>>> model = EncoderDecoderModel(config=config)
```python
import torch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to adapt the Pipeline, AutoModel and transformers-cli examples for summarization since thats what the checkpoint was fine-tuned for


- The Encoder Decoder Model can also be used for translation of different languages. The example below demonstrates a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Helsinki-NLP/opus-mt-en-de is already an encoder-decoder model versus combining a separate encoder and decoder model together. You'll either need to find an existing finetuned EncoderDecoderModel checkpoint for translation or initialize a pretrained encoder/decoder

Comment on lines +31 to +32
> This model was contributed by [Molbap](https://huggingface.co/Molbap), with tremendous help from
> [Anton Vlasjuk](https://github.com/vasqu).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> This model was contributed by [Molbap](https://huggingface.co/Molbap), with tremendous help from
> [Anton Vlasjuk](https://github.com/vasqu).
> This model was contributed by [Molbap](https://huggingface.co/Molbap) and [Anton Vlasjuk](https://huggingface.co/AntonV).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

how to use EncoderDecoderModel to do en-de translation?
2 participants