Add Whisper Speech-to-Text #1114

abheesht17 · 2023-07-05T22:47:33Z

No description provided.

mattdangerw

I think there are still some correctness issue we are debugging. But left some general code comments in meantime.

mattdangerw · 2023-07-11T23:12:05Z

keras_nlp/models/whisper/whisper_audio_to_speech_lm.py

+
+
+@keras_nlp_export("keras_nlp.models.WhisperAudioToSpeechLM")
+class WhisperAudioToSpeechLM(GenerativeTask):


I think we should still probably call this seq2seq for consistency, after all, we didn't call it BartTextToTextLM.

mattdangerw · 2023-07-11T23:13:11Z

keras_nlp/models/whisper/whisper_audio_to_speech_lm.py

+        preprocessor: A `keras_nlp.models.WhisperAudioToSpeechLMPreprocessor` or `None`.
+            If `None`, this model will not apply preprocessing, and inputs
+            should be preprocessed before calling the model.
+    """


Given that we have presets, we can probably go ahead and add docstrings. Also keep in mind that after the multi-backend changes, we will favor np.array over tf.constant.

mattdangerw · 2023-07-11T23:17:04Z

keras_nlp/models/whisper/whisper_audio_to_speech_lm_preprocessor.py

+class WhisperAudioToSpeechLMPreprocessor(WhisperPreprocessor):
+    """Whisper AudioToSpeech LM preprocessor.
+
+    This layer is used as preprocessor for seq2seq tasks using the Whisper model.


maybe use this language instead https://github.com/keras-team/keras-nlp/blob/03668b87b6aae0e4d87cda041e07e2dc81736331/keras_nlp/models/gpt2/gpt2_causal_lm_preprocessor.py#L33-L43

a little more up to date

mattdangerw · 2023-07-11T23:19:59Z

keras_nlp/models/whisper/whisper_tokenizer.py

@@ -113,6 +113,8 @@ def __init__(
        self.translate_token_id = special_tokens[translate_token]
        self.transcribe_token_id = special_tokens[transcribe_token]

+        self.end_token_id = self.eos_token_id


This is kind of an awkward line. Can we just update the whole layer to use end_token_id and start_token_id? That will be more consistent with the rest of the library.

abheesht17 · 2023-10-25T11:41:29Z

Closing this, will open a new PR.

abheesht17 added 7 commits June 30, 2023 09:50

Pass activation in conv layer

e822b07

Fixes

539e88c

Fixes (1)

b39a96a

Format

da6771b

Add AudioToSpeechLM and corresponding preprocessor

7046ae7

Add end_token_id as a mem var for tok

6cbff40

Minor edit

2492f97

abheesht17 marked this pull request as draft July 5, 2023 22:47

abheesht17 added 2 commits July 10, 2023 15:32

Expt (1)

c75f4a9

Support multilingual Whisper AtS

21c49cd

mattdangerw requested changes Jul 11, 2023

View reviewed changes

abheesht17 closed this Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Whisper Speech-to-Text #1114

Add Whisper Speech-to-Text #1114

Uh oh!

abheesht17 commented Jul 5, 2023

Uh oh!

mattdangerw left a comment

Uh oh!

mattdangerw Jul 11, 2023

Uh oh!

mattdangerw Jul 11, 2023

Uh oh!

mattdangerw Jul 11, 2023

Uh oh!

mattdangerw Jul 11, 2023

Uh oh!

abheesht17 commented Oct 25, 2023

Uh oh!

Uh oh!



		@keras_nlp_export("keras_nlp.models.WhisperAudioToSpeechLM")
		class WhisperAudioToSpeechLM(GenerativeTask):

Add Whisper Speech-to-Text #1114

Add Whisper Speech-to-Text #1114

Uh oh!

Conversation

abheesht17 commented Jul 5, 2023

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

mattdangerw Jul 11, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw Jul 11, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw Jul 11, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw Jul 11, 2023

Choose a reason for hiding this comment

Uh oh!

abheesht17 commented Oct 25, 2023

Uh oh!

Uh oh!