Add recipe for audio/speech LLM (ltu-as with llama3) #2550

BenoitWang · 2024-05-16T12:06:49Z

Hi @mravanelli, here's the ltu-as PR as discussed. I am collecting several new datasets and will start a new round of training but this may take time, so meanwhile I start this PR and carry on little by little. @poonehmousavi you are welcome to review the PR as well 😊.

What does this PR do?

Add a recipe for training the LTU-AS model (an LLM that jointly understands audio and speech).
Slight modifs to the LinearWarmupScheduler class.
Adapt the multiwoz llama2 recipe to the latest changes.

To be done

For now, the model is trained with only half the data than in the paper. Though access to certain datasets seems limited, a new training round needs to be carried out with more datasets being collected.
Prepare downloadable json files that facilitate the data preparation stage.
Better to add a tiny validation set for stage 1 and 2.
An evaluation needs to be implemented at the end of stage 3 and the evaluation data needs to be prepared.
Upload training logs and prepare a huggingface interface.
Recipe tests.
Update results and training details in readme.

mravanelli · 2024-06-17T16:06:08Z

Thank you @BenoitWang for this contribution. It looks like some tests are failing. Could you please take a look?

…into speech_llm

BenoitWang added 3 commits May 16, 2024 12:01

add ltu-as recipe

149b3b0

enhance LinearWarmupScheduler

2a79792

adapt llama2 recipe to the latest modif

1b6e295

mravanelli self-requested a review June 17, 2024 16:24

mravanelli assigned BenoitWang Jun 17, 2024

mravanelli added the enhancement New feature or request label Jun 17, 2024

BenoitWang and others added 23 commits June 21, 2024 23:52

modify whisper to not pad to 30s if all the input audios are short

ec05762

fix linear scheduler example

8eacaf3

enhancements, add comments, docstrings

4f598fa

add an evaluation stage

8a02203

fixes

51b30f0

fixes

1334901

fixes

080771b

simplify preparation functions

91ace43

fixes

b990dd1

fixes

db889ed

fix yamls

e1a9e9a

fix whisper example and preparation feature paths

185d27d

add downloadable links

8b86368

add requirements and fix preparation

e0e0990

fix

44ebdc5

Merge branch 'develop' into speech_llm

cb0fe66

Merge branch 'develop' into speech_llm

c7baa58

add inference example

36150fb

Merge branch 'speech_llm' of https://github.com/BenoitWang/speechbrain …

5cdfc49

…into speech_llm

move tltr pretrained weight downloading from stage 0 to stage 1

c4d5662

recipe test

59a6045

add all recipe tests

1724cdd

fix

0173026

asumagic mentioned this pull request Sep 10, 2024

SpeechLLM and Whisper #2663

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add recipe for audio/speech LLM (ltu-as with llama3) #2550

Add recipe for audio/speech LLM (ltu-as with llama3) #2550

Uh oh!

BenoitWang commented May 16, 2024 •

edited

Loading

Uh oh!

mravanelli commented Jun 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add recipe for audio/speech LLM (ltu-as with llama3) #2550

Are you sure you want to change the base?

Add recipe for audio/speech LLM (ltu-as with llama3) #2550

Uh oh!

Conversation

BenoitWang commented May 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

To be done

Uh oh!

mravanelli commented Jun 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BenoitWang commented May 16, 2024 •

edited

Loading