-
Notifications
You must be signed in to change notification settings - Fork 287
Add Whisper Speech-to-Text #1114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Whisper Speech-to-Text #1114
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are still some correctness issue we are debugging. But left some general code comments in meantime.
|
||
|
||
@keras_nlp_export("keras_nlp.models.WhisperAudioToSpeechLM") | ||
class WhisperAudioToSpeechLM(GenerativeTask): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should still probably call this seq2seq for consistency, after all, we didn't call it BartTextToTextLM
.
preprocessor: A `keras_nlp.models.WhisperAudioToSpeechLMPreprocessor` or `None`. | ||
If `None`, this model will not apply preprocessing, and inputs | ||
should be preprocessed before calling the model. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we have presets, we can probably go ahead and add docstrings. Also keep in mind that after the multi-backend changes, we will favor np.array
over tf.constant
.
class WhisperAudioToSpeechLMPreprocessor(WhisperPreprocessor): | ||
"""Whisper AudioToSpeech LM preprocessor. | ||
|
||
This layer is used as preprocessor for seq2seq tasks using the Whisper model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use this language instead https://github.com/keras-team/keras-nlp/blob/03668b87b6aae0e4d87cda041e07e2dc81736331/keras_nlp/models/gpt2/gpt2_causal_lm_preprocessor.py#L33-L43
a little more up to date
@@ -113,6 +113,8 @@ def __init__( | |||
self.translate_token_id = special_tokens[translate_token] | |||
self.transcribe_token_id = special_tokens[transcribe_token] | |||
|
|||
self.end_token_id = self.eos_token_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of an awkward line. Can we just update the whole layer to use end_token_id
and start_token_id
? That will be more consistent with the rest of the library.
Closing this, will open a new PR. |
No description provided.