Skip to content

Conversation

BenoitWang
Copy link
Collaborator

@BenoitWang BenoitWang commented May 16, 2024

Hi @mravanelli, here's the ltu-as PR as discussed. I am collecting several new datasets and will start a new round of training but this may take time, so meanwhile I start this PR and carry on little by little. @poonehmousavi you are welcome to review the PR as well 😊.

What does this PR do?

  1. Add a recipe for training the LTU-AS model (an LLM that jointly understands audio and speech).
  2. Slight modifs to the LinearWarmupScheduler class.
  3. Adapt the multiwoz llama2 recipe to the latest changes.

To be done

  • For now, the model is trained with only half the data than in the paper. Though access to certain datasets seems limited, a new training round needs to be carried out with more datasets being collected.
  • Prepare downloadable json files that facilitate the data preparation stage.
  • Better to add a tiny validation set for stage 1 and 2.
  • An evaluation needs to be implemented at the end of stage 3 and the evaluation data needs to be prepared.
  • Upload training logs and prepare a huggingface interface.
  • Recipe tests.
  • Update results and training details in readme.

@mravanelli
Copy link
Collaborator

Thank you @BenoitWang for this contribution. It looks like some tests are failing. Could you please take a look?

@mravanelli mravanelli self-requested a review June 17, 2024 16:24
@mravanelli mravanelli added the enhancement New feature or request label Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants