Update Model Card for Encoder Decoder Model #39272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

ParagEkbote wants to merge 3 commits into huggingface:main from ParagEkbote:Update-Encoder-Decoder-Card

+96 −78

Contributor

ParagEkbote commented Jul 8, 2025 •

edited

Loading

What does this PR do?

As described in the issue, this PR updates the model card for the encoder decoder model with an additional translation example. I have also re-added the contributor names for the Mamba and Mamba-2 models which were previously removed by me. Please let me know if any modifications are required and I will make the necessary changes.

Fixes #8944
Refs #36979

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).

Who can review?

ParagEkbote added 3 commits

July 8, 2025 07:48


          update model card.

dec1a2d


          add back the model contributors for mamba and mamba2.

2e195ba


          update the model card.

80c4aa5

stevhliu mentioned this pull request

[Community contributions] Model cards #36979

Open

stevhliu reviewed

View reviewed changes

Member

stevhliu left a comment

Thanks for working on this one!

docs/source/en/model_doc/encoder-decoder.md

    
            @@ -14,8 +14,6 @@ rendered properly in your Markdown viewer.
          
              -->

              # Encoder Decoder Models

              <div class="flex flex-wrap space-x-1">

Member

stevhliu Jul 9, 2025

Wrap the badges with the below to align them to the right

<div style="float: right;">
...
</div>

docs/source/en/model_doc/encoder-decoder.md

    
              The effectiveness of initializing sequence-to-sequence models with pretrained checkpoints for sequence generation tasks

              was shown in [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://huggingface.co/papers/1907.12461) by

              Sascha Rothe, Shashi Narayan, Aliaksei Severyn.

              [`EncoderDecoderModel`](https://huggingface.co/papers/1706.03762) has two main parts:

Member

stevhliu Jul 9, 2025

Suggested change

      
            [`EncoderDecoderModel`](https://huggingface.co/papers/1706.03762) has two main parts:
          
            [`EncoderDecoderModel`](https://huggingface.co/papers/1706.03762) initializes a sequence-to-sequence model with any pretrained autoencoder and pretrained autoregressive model. It is effective for sequence generation tasks as demonstrated in [Text Summarization with Pretrained Encoders](https://huggingface.co/papers/1908.08345) which uses [`BertModel`] as the encoder and decoder.

docs/source/en/model_doc/encoder-decoder.md

Comment on lines +29 to +30

    
              Encoder: This part reads the input text and converts it into a set of numerical features capturing the meaning and context 

              of the input.

Member

stevhliu Jul 9, 2025

Suggested change

      
            Encoder: This part reads the input text and converts it into a set of numerical features capturing the meaning and context 
          
            of the input.

docs/source/en/model_doc/encoder-decoder.md

    
              An application of this architecture could be to leverage two pretrained [`BertModel`] as the encoder

              and decoder for a summarization model as was shown in: [Text Summarization with Pretrained Encoders](https://huggingface.co/papers/1908.08345) by Yang Liu and Mirella Lapata.

              Decoder: This part takes the numerical features from the encoder and generates the output text step by step. It uses the information from the encoder to produce meaningful and relevant output, such as a translation, a summary, or an answer.

Member

stevhliu Jul 9, 2025

Suggested change

      
            Decoder: This part takes the numerical features from the encoder and generates the output text step by step. It uses the information from the encoder to produce meaningful and relevant output, such as a translation, a summary, or an answer.

docs/source/en/model_doc/encoder-decoder.md

Comment on lines +34 to +35

    
              We can be use this model class to initialize a sequence-to-sequence model with any pretrained autoencoding model as the 

              encoder and any pretrained autoregressive model as the decoder.

Member

stevhliu Jul 9, 2025

Suggested change

      
            We can be use this model class to initialize a sequence-to-sequence model with any pretrained autoencoding model as the 
          
            encoder and any pretrained autoregressive model as the decoder.

docs/source/en/model_doc/encoder-decoder.md

Comment on lines +132 to +133

    
              - [`EncoderDecoderModel`] can be randomly initialized from an encoder and a decoder config. In the following example, we show    

                how to do this using the default [`BertModel`] configuration for the encoder and the default [`BertForCausalLM`] configuration for the decoder.

Member

stevhliu Jul 9, 2025

Suggested change

      
            - [`EncoderDecoderModel`] can be randomly initialized from an encoder and a decoder config. In the following example, we show    
          
              how to do this using the default [`BertModel`] configuration for the encoder and the default [`BertForCausalLM`] configuration for the decoder.
          
            - [`EncoderDecoderModel`] can be randomly initialized from an encoder and a decoder config as shown below.

docs/source/en/model_doc/encoder-decoder.md

    
              ```

              ## Training

              ## Notes

Member

stevhliu Jul 9, 2025

Add a note here about initializing from pretrained encoder/decoder

docs/source/en/model_doc/encoder-decoder.md

    
              >>> config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)

              >>> model = EncoderDecoderModel(config=config)

              ```python

              import torch

Member

stevhliu Jul 9, 2025

Would be better to adapt the Pipeline, AutoModel and transformers-cli examples for summarization since thats what the checkpoint was fine-tuned for

docs/source/en/model_doc/encoder-decoder.md

    
              - The Encoder Decoder Model can also be used for translation of different languages. The example below demonstrates a

Member

stevhliu Jul 9, 2025

I think Helsinki-NLP/opus-mt-en-de is already an encoder-decoder model versus combining a separate encoder and decoder model together. You'll either need to find an existing finetuned EncoderDecoderModel checkpoint for translation or initialize a pretrained encoder/decoder

docs/source/en/model_doc/mamba.md

Comment on lines +31 to +32

		> This model was contributed by [Molbap](https://huggingface.co/Molbap), with tremendous help from
		> [Anton Vlasjuk](https://github.com/vasqu).

Member

stevhliu Jul 9, 2025

Suggested change

      
            > This model was contributed by [Molbap](https://huggingface.co/Molbap), with tremendous help from 
          
            > [Anton Vlasjuk](https://github.com/vasqu).
          
            > This model was contributed by [Molbap](https://huggingface.co/Molbap) and [Anton Vlasjuk](https://huggingface.co/AntonV).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet