-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[draft] feat: Implement speech-to-text functionality #526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
Conversation
- Introduce a new worker `speech-to-text.worker.ts` for handling speech recognition using Hugging Face's Whisper model.
@alamshafil is attempting to deploy a commit to the OpenCut OSS Team on Vercel. A member of the Team first needs to authorize it. |
WalkthroughThis update introduces a comprehensive speech-to-text (STT) transcription pipeline to the web application. It adds a new web worker for transcription using Hugging Face's Whisper models, a Zustand store for managing STT state and results, a full-featured captions UI, audio extraction utilities, and transcript data structures. Supporting changes ensure proper state management and persistence. Changes
Sequence Diagram(s)sequenceDiagram
participant UI as Captions UI
participant Store as Speech-to-Text Store
participant Worker as STT Worker
participant HF as HuggingFace Transformers
UI->>Store: User clicks "Generate Subtitles"
Store->>Worker: Post message with audio data, model, options
Worker->>HF: Dynamically import transformers & model
Worker-->>Worker: Initialize ASR pipeline (if needed)
Worker->>Worker: Transcribe audio with progress callbacks
Worker-->>Store: Post progress and final transcript
Store-->>UI: Update status, display results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Nitpick comments (6)
apps/web/src/lib/audio-extraction.ts (1)
89-90
: Remove redundant length check.The condition
audioSegments.length > 0
is unnecessary since we've already verified that audioSegments is not empty at line 82-84.- const timelineOffset = audioSegments.length > 0 ? audioSegments[0].startTime : 0; + const timelineOffset = audioSegments[0].startTime;apps/web/public/workers/speech-to-text.worker.js (2)
7-21
: Consider adding fallback for CDN failures and retry logicLoading critical dependencies from CDN introduces a single point of failure. Consider implementing retry logic with exponential backoff and potentially a fallback to a self-hosted version.
Also, the error is being thrown after sending to the main thread (line 19), which might cause the worker to terminate unexpectedly. Consider whether you want to keep the worker alive for potential retry attempts.
-const initTransformers = async () => { +const initTransformers = async (retryCount = 0, maxRetries = 3) => { try { - const transformers = await import('https://cdn.jsdelivr.net/npm/@huggingface/[email protected]/+esm'); + // Add timeout to prevent hanging + const controller = new AbortController(); + const timeoutId = setTimeout(() => controller.abort(), 30000); + + const transformers = await import('https://cdn.jsdelivr.net/npm/@huggingface/[email protected]/+esm'); + clearTimeout(timeoutId); + pipeline = transformers.pipeline; WhisperTextStreamer = transformers.WhisperTextStreamer; } catch (error) { console.error('Failed to import transformers:', error); + + if (retryCount < maxRetries) { + console.log(`Retrying transformers import (${retryCount + 1}/${maxRetries})...`); + await new Promise(resolve => setTimeout(resolve, Math.pow(2, retryCount) * 1000)); + return initTransformers(retryCount + 1, maxRetries); + } + // Send error back to main thread self.postMessage({ status: "error", data: { message: "Failed to load AI model dependencies" } }); - throw error; + // Don't throw - keep worker alive for potential recovery } };
167-167
: Replace nullish coalescing assignment for broader compatibilityThe nullish coalescing assignment operator (
??=
) might not be supported in all target environments.- start_time ??= performance.now(); + if (start_time === null) { + start_time = performance.now(); + }apps/web/src/components/editor/media-panel/views/captions.tsx (1)
457-457
: Add type annotation for chunk parameterThe chunk parameter in the map function should be properly typed.
- {result.chunks.map((chunk: any, index: number) => ( + {result.chunks.map((chunk, index) => (apps/web/src/stores/speech-to-text-store.ts (2)
16-39
: Add proper WebGPU type declarations instead of using 'as any'Using 'as any' bypasses TypeScript's type checking. Consider adding proper WebGPU type declarations.
+// Add at the top of the file +interface GPU { + requestAdapter(): Promise<GPUAdapter | null>; +} + +interface GPUAdapter { + requestDevice(): Promise<GPUDevice>; +} + +interface GPUDevice { + destroy(): void; +} + +interface NavigatorGPU { + gpu?: GPU; +} + async function detectWebGPU(): Promise<boolean> { try { - if (!(navigator as any).gpu) { + const nav = navigator as unknown as NavigatorGPU; + if (!nav.gpu) { return false; } - const adapter = await (navigator as any).gpu.requestAdapter(); + const adapter = await nav.gpu.requestAdapter(); if (!adapter) { return false; }
745-803
: Consider reusing AudioContext for better performanceCreating a new AudioContext for each extraction could be expensive if multiple extractions are performed. Consider reusing a single context or implementing a pool.
+// Add as a module-level variable or store property +let sharedAudioContext: AudioContext | null = null; + +const getAudioContext = (): AudioContext => { + if (!sharedAudioContext || sharedAudioContext.state === 'closed') { + sharedAudioContext = new (window.AudioContext || (window as any).webkitAudioContext)(); + } + return sharedAudioContext; +}; extractAudioFromElement: async (element: any, mediaItem: any): Promise<{ audioData: Float32Array; sampleRate: number }> => { if (!mediaItem?.file) { throw new Error('No media file found for selected element'); } try { // Load the audio file const arrayBuffer = await mediaItem.file.arrayBuffer(); - const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)(); + const audioContext = getAudioContext(); const audioBuffer = await audioContext.decodeAudioData(arrayBuffer); // ... rest of the extraction logic ... - audioContext.close(); + // Don't close the shared context + // audioContext.close(); return { audioData: resampledData, sampleRate: targetSampleRate };
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
apps/web/package.json
(1 hunks)apps/web/public/workers/speech-to-text.worker.js
(1 hunks)apps/web/src/components/editor/media-panel/views/captions.tsx
(1 hunks)apps/web/src/lib/audio-extraction.ts
(1 hunks)apps/web/src/lib/storage/storage-service.ts
(2 hunks)apps/web/src/stores/project-store.ts
(2 hunks)apps/web/src/stores/speech-to-text-store.ts
(1 hunks)apps/web/src/types/transcript.ts
(1 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{jsx,tsx} : Include caption tracks for audio and video elements.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{jsx,tsx} : Include caption tracks for audio and video elements.
📚 Learning: applies to **/*.{ts,tsx} : use `import type` for types....
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{ts,tsx} : Use `import type` for types.
Applied to files:
apps/web/src/stores/project-store.ts
📚 Learning: the file apps/web/src/components/editor/media-panel/views/media.tsx uses "use client" directive, mak...
Learnt from: khanguyen74
PR: OpenCut-app/OpenCut#466
File: apps/web/src/components/editor/media-panel/views/media.tsx:47-52
Timestamp: 2025-07-26T21:07:57.582Z
Learning: The file apps/web/src/components/editor/media-panel/views/media.tsx uses "use client" directive, making it client-only code where window object is always available, so SSR safety checks are not needed.
Applied to files:
apps/web/src/stores/project-store.ts
apps/web/src/components/editor/media-panel/views/captions.tsx
apps/web/src/stores/speech-to-text-store.ts
📚 Learning: applies to **/*.{ts,tsx} : use `export type` for types....
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{ts,tsx} : Use `export type` for types.
Applied to files:
apps/web/src/types/transcript.ts
📚 Learning: applies to **/*.{jsx,tsx} : include caption tracks for audio and video elements....
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{jsx,tsx} : Include caption tracks for audio and video elements.
Applied to files:
apps/web/src/components/editor/media-panel/views/captions.tsx
apps/web/src/lib/audio-extraction.ts
apps/web/src/stores/speech-to-text-store.ts
📚 Learning: applies to **/*.{jsx,tsx} : give all elements requiring alt text meaningful information for screen r...
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{jsx,tsx} : Give all elements requiring alt text meaningful information for screen readers.
Applied to files:
apps/web/src/components/editor/media-panel/views/captions.tsx
📚 Learning: applies to **/*.{jsx,tsx} : use semantic elements instead of role attributes in jsx....
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{jsx,tsx} : Use semantic elements instead of role attributes in JSX.
Applied to files:
apps/web/src/components/editor/media-panel/views/captions.tsx
📚 Learning: applies to **/*.{jsx,tsx} : give heading elements content that's accessible to screen readers (not h...
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{jsx,tsx} : Give heading elements content that's accessible to screen readers (not hidden with `aria-hidden`).
Applied to files:
apps/web/src/components/editor/media-panel/views/captions.tsx
📚 Learning: applies to **/*.{jsx,tsx} : make sure anchors have content that's accessible to screen readers....
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{jsx,tsx} : Make sure anchors have content that's accessible to screen readers.
Applied to files:
apps/web/src/components/editor/media-panel/views/captions.tsx
📚 Learning: applies to **/*.{jsx,tsx} : don't assign non-interactive aria roles to interactive html elements....
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{jsx,tsx} : Don't assign non-interactive ARIA roles to interactive HTML elements.
Applied to files:
apps/web/src/components/editor/media-panel/views/captions.tsx
🧬 Code Graph Analysis (2)
apps/web/src/stores/project-store.ts (1)
apps/web/src/stores/speech-to-text-store.ts (1)
useSpeechToTextStore
(164-805)
apps/web/src/lib/audio-extraction.ts (2)
apps/web/src/types/timeline.ts (1)
TimelineTrack
(82-89)apps/web/src/stores/media-store.ts (1)
MediaItem
(8-26)
🔇 Additional comments (20)
apps/web/package.json (1)
22-22
: LGTM!The addition of
@huggingface/transformers
dependency is appropriate for implementing the speech-to-text functionality.apps/web/src/stores/project-store.ts (2)
7-7
: LGTM!The import is correctly placed and follows the same pattern as other store imports.
190-193
: LGTM!Properly resets speech-to-text state when loading a project, maintaining consistency with other store resets.
apps/web/src/lib/storage/storage-service.ts (2)
77-78
: LGTM!Canvas properties are correctly added to project serialization.
101-102
: LGTM!Canvas properties are correctly restored during project deserialization, maintaining consistency with the serialization format.
apps/web/src/types/transcript.ts (3)
35-46
: LGTM!The time conversion function correctly handles the SRT format with proper padding.
51-64
: LGTM!The SRT generation correctly formats transcript chunks into valid SRT format.
69-142
: LGTM!Comprehensive implementation with robust fallback logic for handling various Whisper output formats.
apps/web/src/lib/audio-extraction.ts (5)
5-6
: LGTM!Correctly uses
import type
for type imports following project conventions.
101-113
: LGTM!Proper AudioContext lifecycle management with cleanup in both success and error paths.
170-215
: LGTM!Well-implemented audio concatenation with proper mixing and normalization.
220-235
: LGTM!Correct normalization implementation with safe peak level.
240-269
: LGTM!Appropriate linear interpolation resampling for speech recognition preprocessing.
apps/web/public/workers/speech-to-text.worker.js (3)
64-103
: LGTM!The message handler properly handles test connectivity, lazy initialization, and error cases with appropriate error message extraction.
233-248
: Add safety checks for chunk array access and optimize filteringThe code accesses the last element of chunks array without checking if it exists after filtering.
// Format and validate chunks const formattedChunks = chunks .filter(chunk => chunk.finalised && chunk.text.trim()) .map(chunk => { const startTime = chunk.timestamp[0]; const endTime = chunk.timestamp[1] || startTime; return { text: chunk.text.trim(), timestamp: [startTime, endTime] }; }); + + // Safely get the last timestamp + const lastChunk = formattedChunks[formattedChunks.length - 1]; + const finalTimestamp = lastChunk ? lastChunk.timestamp[1] : 0; return { tps, text: output.text || '', chunks: formattedChunks, };⛔ Skipped due to learnings
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .github/copilot-instructions.md:0-0 Timestamp: 2025-07-27T22:14:46.402Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Make sure iterable callbacks return consistent values.
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .github/copilot-instructions.md:0-0 Timestamp: 2025-07-27T22:14:46.402Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Use `Array#{indexOf,lastIndexOf}()` instead of `Array#{findIndex,findLastIndex}()` when looking for the index of an item.
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .cursor/rules/ultracite.mdc:0-0 Timestamp: 2025-07-27T22:15:27.748Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .github/copilot-instructions.md:0-0 Timestamp: 2025-07-27T22:14:46.402Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .github/copilot-instructions.md:0-0 Timestamp: 2025-07-27T22:14:46.403Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Don't use sparse arrays (arrays with holes).
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .github/copilot-instructions.md:0-0 Timestamp: 2025-07-27T22:14:46.402Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Use `String.slice()` instead of `String.substr()` and `String.substring()`.
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .github/copilot-instructions.md:0-0 Timestamp: 2025-07-27T22:14:46.402Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Use `String.trimStart()` and `String.trimEnd()` over `String.trimLeft()` and `String.trimRight()`.
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .github/copilot-instructions.md:0-0 Timestamp: 2025-07-27T22:14:46.402Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Don't use optional chaining where undefined values aren't allowed.
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .github/copilot-instructions.md:0-0 Timestamp: 2025-07-27T22:14:46.402Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Make sure "for" loop update clauses move the counter in the right direction.
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .github/copilot-instructions.md:0-0 Timestamp: 2025-07-27T22:14:46.402Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Don't use spread (`...`) syntax on accumulators.
133-143
: Fix validation logic to avoid optional chaining with non-optional property accessThe validation uses optional chaining but then accesses nested properties that might not exist, which could throw runtime errors.
- if (!transcriber?.processor?.feature_extractor?.config) { + if (!transcriber || !transcriber.processor || !transcriber.processor.feature_extractor || !transcriber.processor.feature_extractor.config) { throw new Error('Invalid transcriber configuration - missing feature extractor'); } - if (!transcriber.model?.config?.max_source_positions) { + if (!transcriber.model || !transcriber.model.config || !transcriber.model.config.max_source_positions) { throw new Error('Invalid transcriber configuration - missing model config'); } const time_precision = transcriber.processor.feature_extractor.config.chunk_length / transcriber.model.config.max_source_positions;⛔ Skipped due to learnings
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .cursor/rules/ultracite.mdc:0-0 Timestamp: 2025-07-27T22:15:27.748Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .github/copilot-instructions.md:0-0 Timestamp: 2025-07-27T22:14:46.402Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .github/copilot-instructions.md:0-0 Timestamp: 2025-07-27T22:14:46.402Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Don't use optional chaining where undefined values aren't allowed.
apps/web/src/components/editor/media-panel/views/captions.tsx (2)
87-327
: LGTM! Accessibility implementation follows best practicesThe component properly uses semantic HTML elements, provides labels for form controls, and includes tooltips for icon buttons, adhering to the accessibility requirements from the retrieved learnings.
350-353
: Add safety check for empty chunks arrayAccessing the last element of chunks array without checking if it exists could cause runtime errors.
const totalDuration = result.chunks.length > 0 - ? result.chunks[result.chunks.length - 1]?.timestamp[1] + ? result.chunks[result.chunks.length - 1]?.timestamp?.[1] || 0 : 0;⛔ Skipped due to learnings
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .github/copilot-instructions.md:0-0 Timestamp: 2025-07-27T22:14:46.402Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .cursor/rules/ultracite.mdc:0-0 Timestamp: 2025-07-27T22:15:27.748Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .github/copilot-instructions.md:0-0 Timestamp: 2025-07-27T22:14:46.402Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Don't use optional chaining where undefined values aren't allowed.
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .github/copilot-instructions.md:0-0 Timestamp: 2025-07-27T22:14:46.402Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Use Date.now() to get milliseconds since the Unix Epoch.
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .cursor/rules/ultracite.mdc:0-0 Timestamp: 2025-07-27T22:15:27.748Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Use Date.now() to get milliseconds since the Unix Epoch.
Learnt from: CR PR: OpenCut-app/OpenCut#0 File: .github/copilot-instructions.md:0-0 Timestamp: 2025-07-27T22:14:46.402Z Learning: Applies to **/*.{js,jsx,ts,tsx} : Use isNaN() when checking for NaN.
apps/web/src/stores/speech-to-text-store.ts (2)
201-434
: LGTM! Robust worker initialization with proper error handlingThe worker initialization properly manages promises to prevent race conditions, includes comprehensive error handling, and implements cleanup on failure. Good use of the initialization promise pattern.
511-528
: LGTM! Proper state cleanup and resource managementThe resetState function properly terminates the worker and resets state while preserving device capabilities and model configuration that don't need to be re-detected.
static async getInstance(model, progressCallback) { | ||
if (this.instance === null || this.currentModel !== model) { | ||
if (this.instance) { | ||
try { | ||
await this.instance.dispose(); | ||
} catch (error) { | ||
console.warn('Failed to dispose previous model instance:', error); | ||
} | ||
this.instance = null; | ||
} | ||
|
||
this.currentModel = model; | ||
this.instance = await pipeline("automatic-speech-recognition", model, { | ||
dtype: { | ||
encoder_model: model === "onnx-community/whisper-large-v3-turbo" ? "fp16" : "fp32", | ||
decoder_model_merged: "q4", | ||
}, | ||
device: "webgpu", | ||
progress_callback: progressCallback, | ||
}); | ||
} | ||
return this.instance; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make device configuration dynamic and improve dtype selection
The device is hardcoded to 'webgpu' (line 44) but WebGPU might not be available on all devices. The store detects capabilities but this information isn't passed to the worker. Also, the dtype selection only checks for one specific model.
- static async getInstance(model, progressCallback) {
+ static async getInstance(model, progressCallback, deviceConfig = { device: 'webgpu', dtype: 'fp32' }) {
if (this.instance === null || this.currentModel !== model) {
if (this.instance) {
try {
await this.instance.dispose();
} catch (error) {
console.warn('Failed to dispose previous model instance:', error);
}
this.instance = null;
}
this.currentModel = model;
+
+ // Determine dtype based on model and device
+ const getDtype = (model, device) => {
+ const isLargeModel = model.includes('large') || model.includes('medium');
+ const canUseFp16 = device === 'webgpu' && !model.includes('whisper-base');
+
+ return {
+ encoder_model: canUseFp16 ? 'fp16' : 'fp32',
+ decoder_model_merged: isLargeModel ? 'q4' : 'q8'
+ };
+ };
+
this.instance = await pipeline("automatic-speech-recognition", model, {
- dtype: {
- encoder_model: model === "onnx-community/whisper-large-v3-turbo" ? "fp16" : "fp32",
- decoder_model_merged: "q4",
- },
- device: "webgpu",
+ dtype: deviceConfig.customDtype || getDtype(model, deviceConfig.device),
+ device: deviceConfig.device,
progress_callback: progressCallback,
});
}
return this.instance;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
static async getInstance(model, progressCallback) { | |
if (this.instance === null || this.currentModel !== model) { | |
if (this.instance) { | |
try { | |
await this.instance.dispose(); | |
} catch (error) { | |
console.warn('Failed to dispose previous model instance:', error); | |
} | |
this.instance = null; | |
} | |
this.currentModel = model; | |
this.instance = await pipeline("automatic-speech-recognition", model, { | |
dtype: { | |
encoder_model: model === "onnx-community/whisper-large-v3-turbo" ? "fp16" : "fp32", | |
decoder_model_merged: "q4", | |
}, | |
device: "webgpu", | |
progress_callback: progressCallback, | |
}); | |
} | |
return this.instance; | |
} | |
static async getInstance(model, progressCallback, deviceConfig = { device: 'webgpu', dtype: 'fp32' }) { | |
if (this.instance === null || this.currentModel !== model) { | |
if (this.instance) { | |
try { | |
await this.instance.dispose(); | |
} catch (error) { | |
console.warn('Failed to dispose previous model instance:', error); | |
} | |
this.instance = null; | |
} | |
this.currentModel = model; | |
// Determine dtype based on model and device | |
const getDtype = (model, device) => { | |
const isLargeModel = model.includes('large') || model.includes('medium'); | |
const canUseFp16 = device === 'webgpu' && !model.includes('whisper-base'); | |
return { | |
encoder_model: canUseFp16 ? 'fp16' : 'fp32', | |
decoder_model_merged: isLargeModel ? 'q4' : 'q8' | |
}; | |
}; | |
this.instance = await pipeline("automatic-speech-recognition", model, { | |
dtype: deviceConfig.customDtype || getDtype(model, deviceConfig.device), | |
device: deviceConfig.device, | |
progress_callback: progressCallback, | |
}); | |
} | |
return this.instance; | |
} |
🤖 Prompt for AI Agents
In apps/web/public/workers/speech-to-text.worker.js between lines 27 and 49, the
device is hardcoded to 'webgpu', which may not be supported on all devices, and
dtype selection only checks for one specific model. Modify the getInstance
method to accept device capability information as a parameter or retrieve it
dynamically, then use this to set the device option instead of hardcoding
'webgpu'. Also, enhance the dtype selection logic to handle multiple models or
make it configurable rather than only checking for
"onnx-community/whisper-large-v3-turbo".
let start_time = null; | ||
let num_tokens = 0; | ||
let tps; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize tps variable to avoid potential undefined value
The tps
variable is declared without initialization and might remain undefined if no tokens are processed.
let chunk_count = 0;
let start_time = null;
let num_tokens = 0;
- let tps;
+ let tps = 0;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
let start_time = null; | |
let num_tokens = 0; | |
let tps; | |
let chunk_count = 0; | |
let start_time = null; | |
let num_tokens = 0; | |
let tps = 0; |
🤖 Prompt for AI Agents
In apps/web/public/workers/speech-to-text.worker.js around lines 151 to 154, the
variable tps is declared but not initialized, which can lead to it being
undefined if no tokens are processed. Initialize tps to a default value such as
0 to ensure it always has a defined numeric value.
onClick={async () => { | ||
if (!selectedElementInfo) { | ||
return; | ||
} | ||
|
||
try { | ||
// Always ensure worker is initialized first | ||
if (!isWorkerInitialized) { | ||
await handleInitialize(); | ||
|
||
// Wait for the worker to actually be initialized | ||
// We'll poll the store state until isWorkerInitialized becomes true | ||
let retries = 0; | ||
const maxRetries = 50; // 5 seconds max wait | ||
while (!useSpeechToTextStore.getState().isWorkerInitialized && retries < maxRetries) { | ||
await new Promise(resolve => setTimeout(resolve, 100)); | ||
retries++; | ||
} | ||
|
||
if (!useSpeechToTextStore.getState().isWorkerInitialized) { | ||
console.error('Worker initialization timed out'); | ||
return; | ||
} | ||
} | ||
|
||
// Then process the element | ||
await handleProcess(); | ||
} catch (error) { | ||
console.error('Error in generate subtitles:', error); | ||
} | ||
}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Extract complex initialization logic and add timeout safeguards
The button's onClick handler contains complex initialization and polling logic with magic numbers. This should be extracted to a separate function for better maintainability.
+ const initializeAndProcess = async () => {
+ const MAX_RETRIES = 50;
+ const RETRY_DELAY_MS = 100;
+
+ if (!selectedElementInfo) {
+ return;
+ }
+
+ try {
+ // Always ensure worker is initialized first
+ if (!isWorkerInitialized) {
+ await handleInitialize();
+
+ // Wait for the worker to actually be initialized
+ let retries = 0;
+ while (!useSpeechToTextStore.getState().isWorkerInitialized && retries < MAX_RETRIES) {
+ await new Promise(resolve => setTimeout(resolve, RETRY_DELAY_MS));
+ retries++;
+ }
+
+ if (!useSpeechToTextStore.getState().isWorkerInitialized) {
+ throw new Error('Worker initialization timed out');
+ }
+ }
+
+ // Then process the element
+ await handleProcess();
+ } catch (error) {
+ console.error('Error in generate subtitles:', error);
+ // Consider showing user-friendly error message
+ }
+ };
<Button
- onClick={async () => {
- if (!selectedElementInfo) {
- return;
- }
-
- try {
- // Always ensure worker is initialized first
- if (!isWorkerInitialized) {
- await handleInitialize();
-
- // Wait for the worker to actually be initialized
- // We'll poll the store state until isWorkerInitialized becomes true
- let retries = 0;
- const maxRetries = 50; // 5 seconds max wait
- while (!useSpeechToTextStore.getState().isWorkerInitialized && retries < maxRetries) {
- await new Promise(resolve => setTimeout(resolve, 100));
- retries++;
- }
-
- if (!useSpeechToTextStore.getState().isWorkerInitialized) {
- console.error('Worker initialization timed out');
- return;
- }
- }
-
- // Then process the element
- await handleProcess();
- } catch (error) {
- console.error('Error in generate subtitles:', error);
- }
- }}
+ onClick={initializeAndProcess}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
onClick={async () => { | |
if (!selectedElementInfo) { | |
return; | |
} | |
try { | |
// Always ensure worker is initialized first | |
if (!isWorkerInitialized) { | |
await handleInitialize(); | |
// Wait for the worker to actually be initialized | |
// We'll poll the store state until isWorkerInitialized becomes true | |
let retries = 0; | |
const maxRetries = 50; // 5 seconds max wait | |
while (!useSpeechToTextStore.getState().isWorkerInitialized && retries < maxRetries) { | |
await new Promise(resolve => setTimeout(resolve, 100)); | |
retries++; | |
} | |
if (!useSpeechToTextStore.getState().isWorkerInitialized) { | |
console.error('Worker initialization timed out'); | |
return; | |
} | |
} | |
// Then process the element | |
await handleProcess(); | |
} catch (error) { | |
console.error('Error in generate subtitles:', error); | |
} | |
}} | |
// --- extract complex init + retry logic into a helper --- | |
const initializeAndProcess = async () => { | |
const MAX_RETRIES = 50; | |
const RETRY_DELAY_MS = 100; | |
if (!selectedElementInfo) { | |
return; | |
} | |
try { | |
// Always ensure worker is initialized first | |
if (!isWorkerInitialized) { | |
await handleInitialize(); | |
// Wait (with a cap) for the worker to actually be initialized | |
let retries = 0; | |
while (!useSpeechToTextStore.getState().isWorkerInitialized && retries < MAX_RETRIES) { | |
await new Promise(resolve => setTimeout(resolve, RETRY_DELAY_MS)); | |
retries++; | |
} | |
if (!useSpeechToTextStore.getState().isWorkerInitialized) { | |
throw new Error('Worker initialization timed out'); | |
} | |
} | |
// Then process the element | |
await handleProcess(); | |
} catch (error) { | |
console.error('Error in generate subtitles:', error); | |
// TODO: display a user-friendly message in the UI | |
} | |
}; | |
<Button | |
onClick={initializeAndProcess} | |
/* …other props… */ | |
> | |
Generate Subtitles | |
</Button> |
🤖 Prompt for AI Agents
In apps/web/src/components/editor/media-panel/views/captions.tsx around lines
172 to 202, the onClick handler has complex worker initialization and polling
logic with hardcoded retry counts and delays. Extract this logic into a separate
async function that handles worker initialization with a configurable timeout
and retry mechanism. Replace the inline code in onClick with a call to this new
function to improve readability and maintainability.
interface CaptionResultCardProps { | ||
result: any; // TranscriptionResult type from store | ||
onRemove: () => void; | ||
onInsertToTimeline: () => void; | ||
onDownloadSRT: () => void; | ||
onViewSegments: () => void; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace 'any' type with proper TypeScript interface
Using 'any' type defeats TypeScript's type safety. Import and use the proper TranscriptionResult type.
+import type { TranscriptionResult } from "@/stores/speech-to-text-store";
+
interface CaptionResultCardProps {
- result: any; // TranscriptionResult type from store
+ result: TranscriptionResult;
onRemove: () => void;
onInsertToTimeline: () => void;
onDownloadSRT: () => void;
onViewSegments: () => void;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
interface CaptionResultCardProps { | |
result: any; // TranscriptionResult type from store | |
onRemove: () => void; | |
onInsertToTimeline: () => void; | |
onDownloadSRT: () => void; | |
onViewSegments: () => void; | |
} | |
import type { TranscriptionResult } from "@/stores/speech-to-text-store"; | |
interface CaptionResultCardProps { | |
result: TranscriptionResult; | |
onRemove: () => void; | |
onInsertToTimeline: () => void; | |
onDownloadSRT: () => void; | |
onViewSegments: () => void; | |
} |
🤖 Prompt for AI Agents
In apps/web/src/components/editor/media-panel/views/captions.tsx around lines
329 to 335, replace the 'any' type used for the 'result' prop in
CaptionResultCardProps with the correct TranscriptionResult interface. Import
the TranscriptionResult type from the appropriate store module and update the
type annotation to use it instead of 'any' to ensure proper type safety.
interface SegmentViewerProps { | ||
result: any; // TranscriptionResult | ||
onClose: () => void; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use proper TypeScript types instead of 'any'
Replace 'any' types with proper interfaces for better type safety.
+import type { TranscriptionResult } from "@/stores/speech-to-text-store";
+
interface SegmentViewerProps {
- result: any; // TranscriptionResult
+ result: TranscriptionResult | undefined;
onClose: () => void;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
interface SegmentViewerProps { | |
result: any; // TranscriptionResult | |
onClose: () => void; | |
} | |
import type { TranscriptionResult } from "@/stores/speech-to-text-store"; | |
interface SegmentViewerProps { | |
result: TranscriptionResult | undefined; | |
onClose: () => void; | |
} |
🤖 Prompt for AI Agents
In apps/web/src/components/editor/media-panel/views/captions.tsx around lines
431 to 434, the SegmentViewerProps interface uses 'any' for the 'result'
property, which reduces type safety. Replace 'any' with the correct TypeScript
interface that represents the transcription result, such as
'TranscriptionResult' or the appropriate type defined elsewhere in the codebase,
to ensure proper type checking and improve code reliability.
if (actualDurationSamples <= 0) { | ||
// Return silent buffer if no valid audio range | ||
const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)(); const silentBuffer = audioContext.createBuffer( | ||
audioBuffer.numberOfChannels, | ||
1, | ||
sampleRate | ||
); | ||
audioContext.close(); | ||
return silentBuffer; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix formatting and optimize AudioContext usage.
There's a formatting issue at line 136 and the function creates multiple AudioContext instances.
if (actualDurationSamples <= 0) {
// Return silent buffer if no valid audio range
- const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)(); const silentBuffer = audioContext.createBuffer(
+ const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)();
+ const silentBuffer = audioContext.createBuffer(
audioBuffer.numberOfChannels,
1,
sampleRate
);
audioContext.close();
return silentBuffer;
}
Also consider creating a single AudioContext at the beginning of the function to avoid multiple instantiations.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if (actualDurationSamples <= 0) { | |
// Return silent buffer if no valid audio range | |
const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)(); const silentBuffer = audioContext.createBuffer( | |
audioBuffer.numberOfChannels, | |
1, | |
sampleRate | |
); | |
audioContext.close(); | |
return silentBuffer; | |
} | |
if (actualDurationSamples <= 0) { | |
// Return silent buffer if no valid audio range | |
const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)(); | |
const silentBuffer = audioContext.createBuffer( | |
audioBuffer.numberOfChannels, | |
1, | |
sampleRate | |
); | |
audioContext.close(); | |
return silentBuffer; | |
} |
🤖 Prompt for AI Agents
In apps/web/src/lib/audio-extraction.ts around lines 134 to 143, fix the
formatting issue at line 136 by properly indenting the code and splitting the
combined statements into separate lines. Additionally, refactor the function to
create a single AudioContext instance at the beginning and reuse it throughout
the function to avoid multiple instantiations, ensuring to close the
AudioContext appropriately after use.
insertResultToTimeline: (resultId: string, mode: 'sentences' | 'words' = 'sentences') => { | ||
const state = get(); | ||
const result = state.results.find(r => r.id === resultId); | ||
|
||
if (!result) { | ||
return; | ||
} | ||
|
||
const timelineStore = useTimelineStore.getState(); | ||
const elementInfo = state.getSelectedElementInfo(); | ||
|
||
if (!elementInfo) { | ||
console.warn('No element selected - cannot determine timeline position'); | ||
return; | ||
} | ||
|
||
// Calculate timing offset based on the element's position in the timeline | ||
const element = elementInfo.element; | ||
|
||
// IMPORTANT: The timestamps from transcription are relative to the EXTRACTED audio | ||
// which starts from trimStart, not from the beginning of the original file | ||
// So we need to add: element.startTime (timeline position) | ||
// Note: We DON'T add trimStart because the transcription timestamps are already relative to the extracted portion | ||
const timelineOffset = element.startTime; | ||
|
||
// Prepare all text elements in advance | ||
const textElements: TextElement[] = []; | ||
|
||
if (mode === 'words') { | ||
// Insert individual words as text elements | ||
let wordCount = 0; | ||
result.transcript.chunks.forEach((chunk) => { | ||
chunk.words.forEach((word) => { | ||
const startTime = timelineOffset + (word.startTime / 1000); // Convert from ms to seconds | ||
const duration = (word.endTime - word.startTime) / 1000; // Convert from ms to seconds | ||
|
||
const textElement: TextElement = { | ||
id: crypto.randomUUID(), | ||
type: 'text', | ||
name: `Word ${wordCount + 1}: ${word.text}`, | ||
content: word.text, | ||
duration: Math.max(duration, 0.5), // Minimum 0.5 second duration for words | ||
startTime: startTime, | ||
trimStart: 0, | ||
trimEnd: 0, | ||
fontSize: 36, | ||
fontFamily: 'Arial', | ||
color: '#ffffff', | ||
backgroundColor: 'rgba(0, 0, 0, 0.7)', | ||
textAlign: 'center' as const, | ||
fontWeight: 'bold' as const, | ||
fontStyle: 'normal' as const, | ||
textDecoration: 'none' as const, | ||
x: 0, | ||
y: 200, | ||
rotation: 0, | ||
opacity: 1 | ||
}; | ||
|
||
textElements.push(textElement); | ||
wordCount++; | ||
}); | ||
}); | ||
|
||
} else { | ||
// Insert sentence chunks as text elements | ||
result.chunks.forEach((chunk, index) => { | ||
const duration = chunk.timestamp[1] - chunk.timestamp[0]; | ||
// Add the timeline offset to get the absolute position | ||
const startTime = timelineOffset + chunk.timestamp[0]; | ||
|
||
// Create a more descriptive name for the text element | ||
const words = chunk.text.trim().split(/\s+/); | ||
const shortText = words.length > 3 | ||
? words.slice(0, 3).join(' ') + '...' | ||
: chunk.text; | ||
|
||
const textElement: TextElement = { | ||
id: crypto.randomUUID(), | ||
type: 'text', | ||
name: `Subtitle ${index + 1}: ${shortText}`, | ||
content: chunk.text.trim(), | ||
duration: Math.max(duration, 1), // Ensure minimum 1 second duration | ||
startTime: startTime, | ||
trimStart: 0, | ||
trimEnd: 0, | ||
fontSize: 36, | ||
fontFamily: 'Arial', | ||
color: '#ffffff', | ||
backgroundColor: 'rgba(0, 0, 0, 0.7)', // Semi-transparent background for readability | ||
textAlign: 'center' as const, | ||
fontWeight: 'bold' as const, | ||
fontStyle: 'normal' as const, | ||
textDecoration: 'none' as const, | ||
x: 0, // Center horizontally | ||
y: 200, // Position at bottom (positive y moves down from center) | ||
rotation: 0, | ||
opacity: 1 | ||
}; | ||
|
||
textElements.push(textElement); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Extract text element styling to constants for better maintainability
The text element creation has many hardcoded style values that should be extracted to constants or configuration.
+// Add at the top of the file or in a separate config
+const DEFAULT_TEXT_ELEMENT_STYLE = {
+ fontSize: 36,
+ fontFamily: 'Arial',
+ color: '#ffffff',
+ backgroundColor: 'rgba(0, 0, 0, 0.7)',
+ textAlign: 'center' as const,
+ fontWeight: 'bold' as const,
+ fontStyle: 'normal' as const,
+ textDecoration: 'none' as const,
+ x: 0,
+ y: 200,
+ rotation: 0,
+ opacity: 1
+};
+
+const MINIMUM_WORD_DURATION = 0.5;
+const MINIMUM_SENTENCE_DURATION = 1.0;
if (mode === 'words') {
// Insert individual words as text elements
let wordCount = 0;
result.transcript.chunks.forEach((chunk) => {
chunk.words.forEach((word) => {
const startTime = timelineOffset + (word.startTime / 1000);
const duration = (word.endTime - word.startTime) / 1000;
const textElement: TextElement = {
id: crypto.randomUUID(),
type: 'text',
name: `Word ${wordCount + 1}: ${word.text}`,
content: word.text,
- duration: Math.max(duration, 0.5),
+ duration: Math.max(duration, MINIMUM_WORD_DURATION),
startTime: startTime,
trimStart: 0,
trimEnd: 0,
- fontSize: 36,
- fontFamily: 'Arial',
- color: '#ffffff',
- backgroundColor: 'rgba(0, 0, 0, 0.7)',
- textAlign: 'center' as const,
- fontWeight: 'bold' as const,
- fontStyle: 'normal' as const,
- textDecoration: 'none' as const,
- x: 0,
- y: 200,
- rotation: 0,
- opacity: 1
+ ...DEFAULT_TEXT_ELEMENT_STYLE
};
🤖 Prompt for AI Agents
In apps/web/src/stores/speech-to-text-store.ts around lines 536 to 638, the text
element creation uses many hardcoded style values such as fontSize, fontFamily,
color, backgroundColor, textAlign, fontWeight, fontStyle, textDecoration, x, y,
rotation, and opacity. To improve maintainability, extract these repeated style
properties into constants or a configuration object at the top of the file or in
a separate styles module. Then, reference these constants when creating text
elements instead of hardcoding the values directly.
export interface TranscriptWord { | ||
text: string; | ||
startTime: number; // in milliseconds | ||
endTime: number; // in milliseconds | ||
} | ||
|
||
export interface TranscriptChunk { | ||
words: TranscriptWord[]; | ||
startTime: number; // in milliseconds | ||
endTime: number; // in milliseconds | ||
text: string; | ||
} | ||
|
||
export interface Transcript { | ||
id: string; | ||
chunks: TranscriptChunk[]; | ||
language: string; | ||
totalDuration: number; // in milliseconds | ||
} | ||
|
||
export interface SRTSegment { | ||
index: number; | ||
startTime: string; // SRT format: "00:00:01,000" | ||
endTime: string; // SRT format: "00:00:02,000" | ||
text: string; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use export type
for type exports.
According to the project's TypeScript conventions, types should be exported using export type
.
Apply this diff to follow the project's type export convention:
-export interface TranscriptWord {
+export type TranscriptWord = {
text: string;
startTime: number; // in milliseconds
endTime: number; // in milliseconds
-}
+};
-export interface TranscriptChunk {
+export type TranscriptChunk = {
words: TranscriptWord[];
startTime: number; // in milliseconds
endTime: number; // in milliseconds
text: string;
-}
+};
-export interface Transcript {
+export type Transcript = {
id: string;
chunks: TranscriptChunk[];
language: string;
totalDuration: number; // in milliseconds
-}
+};
-export interface SRTSegment {
+export type SRTSegment = {
index: number;
startTime: string; // SRT format: "00:00:01,000"
endTime: string; // SRT format: "00:00:02,000"
text: string;
-}
+};
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
export interface TranscriptWord { | |
text: string; | |
startTime: number; // in milliseconds | |
endTime: number; // in milliseconds | |
} | |
export interface TranscriptChunk { | |
words: TranscriptWord[]; | |
startTime: number; // in milliseconds | |
endTime: number; // in milliseconds | |
text: string; | |
} | |
export interface Transcript { | |
id: string; | |
chunks: TranscriptChunk[]; | |
language: string; | |
totalDuration: number; // in milliseconds | |
} | |
export interface SRTSegment { | |
index: number; | |
startTime: string; // SRT format: "00:00:01,000" | |
endTime: string; // SRT format: "00:00:02,000" | |
text: string; | |
} | |
export type TranscriptWord = { | |
text: string; | |
startTime: number; // in milliseconds | |
endTime: number; // in milliseconds | |
}; | |
export type TranscriptChunk = { | |
words: TranscriptWord[]; | |
startTime: number; // in milliseconds | |
endTime: number; // in milliseconds | |
text: string; | |
}; | |
export type Transcript = { | |
id: string; | |
chunks: TranscriptChunk[]; | |
language: string; | |
totalDuration: number; // in milliseconds | |
}; | |
export type SRTSegment = { | |
index: number; | |
startTime: string; // SRT format: "00:00:01,000" | |
endTime: string; // SRT format: "00:00:02,000" | |
text: string; | |
}; |
🤖 Prompt for AI Agents
In apps/web/src/types/transcript.ts from lines 5 to 30, the interfaces are
exported using `export interface`. To follow the project's TypeScript
conventions, change all `export interface` declarations to `export type`
declarations while keeping the structure intact. This means converting each
interface to a type alias with the same shape and exporting it using `export
type`.
Description
This PR adds speech-to-text (STT) support, enabling transcription of audio tracks using HuggingFace Whisper models. Uses
@huggingface/transformers
in a web worker. Additional code was required to extract audio tracks from video.Type of change
Screenshots (if applicable)
Checklist:
Additional context
Opening as Draft PR.
Summary by CodeRabbit
New Features
Chores