Skip to content

Add Whisper Speech-to-Text #1114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

abheesht17
Copy link
Collaborator

No description provided.

@abheesht17 abheesht17 marked this pull request as draft July 5, 2023 22:47
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are still some correctness issue we are debugging. But left some general code comments in meantime.



@keras_nlp_export("keras_nlp.models.WhisperAudioToSpeechLM")
class WhisperAudioToSpeechLM(GenerativeTask):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still probably call this seq2seq for consistency, after all, we didn't call it BartTextToTextLM.

preprocessor: A `keras_nlp.models.WhisperAudioToSpeechLMPreprocessor` or `None`.
If `None`, this model will not apply preprocessing, and inputs
should be preprocessed before calling the model.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we have presets, we can probably go ahead and add docstrings. Also keep in mind that after the multi-backend changes, we will favor np.array over tf.constant.

class WhisperAudioToSpeechLMPreprocessor(WhisperPreprocessor):
"""Whisper AudioToSpeech LM preprocessor.

This layer is used as preprocessor for seq2seq tasks using the Whisper model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -113,6 +113,8 @@ def __init__(
self.translate_token_id = special_tokens[translate_token]
self.transcribe_token_id = special_tokens[transcribe_token]

self.end_token_id = self.eos_token_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of an awkward line. Can we just update the whole layer to use end_token_id and start_token_id? That will be more consistent with the rest of the library.

@abheesht17
Copy link
Collaborator Author

Closing this, will open a new PR.

@abheesht17 abheesht17 closed this Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants