[draft] feat: Implement speech-to-text functionality #526

alamshafil · 2025-08-06T00:24:35Z

Description

This PR adds speech-to-text (STT) support, enabling transcription of audio tracks using HuggingFace Whisper models. Uses @huggingface/transformers in a web worker. Additional code was required to extract audio tracks from video.

Type of change

New feature (non-breaking change which adds functionality)

Screenshots (if applicable)

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my code
I have added screenshots if ui has been changed
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules

Additional context

Opening as Draft PR.

Summary by CodeRabbit

New Features
- Introduced comprehensive speech-to-text transcription capabilities, including model selection, device capability detection, and progress reporting.
- Added a full-featured captions panel for generating, viewing, and managing subtitles from audio or video elements, with options to insert captions, download SRT files, and view detailed segments.
- Enabled extraction and processing of audio from media tracks for transcription.
- Provided utilities for transcript data handling and SRT subtitle generation.
Chores
- Updated dependencies to include Hugging Face Transformers for speech recognition.
- Enhanced project save/load functionality to include canvas size and mode.
- Improved state management to reset speech-to-text state when switching projects.

- Introduce a new worker `speech-to-text.worker.ts` for handling speech recognition using Hugging Face's Whisper model.

vercel · 2025-08-06T00:24:40Z

@alamshafil is attempting to deploy a commit to the OpenCut OSS Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2025-08-06T00:24:43Z

Walkthrough

This update introduces a comprehensive speech-to-text (STT) transcription pipeline to the web application. It adds a new web worker for transcription using Hugging Face's Whisper models, a Zustand store for managing STT state and results, a full-featured captions UI, audio extraction utilities, and transcript data structures. Supporting changes ensure proper state management and persistence.

Changes

Cohort / File(s)	Change Summary
Speech-to-Text Web Worker `apps/web/public/workers/speech-to-text.worker.js`	Introduces a new web worker that dynamically loads Hugging Face's transformers library, manages ASR pipeline instances, and processes transcription requests with progress reporting, error handling, and streaming output for speech-to-text tasks.
Captions UI Component `apps/web/src/components/editor/media-panel/views/captions.tsx`	Completely rewrites the captions component to provide a full-featured UI for speech-to-text caption generation, model selection, progress/status display, error handling, and result management, including segment viewing, timeline insertion, and SRT download.
Speech-to-Text Store `apps/web/src/stores/speech-to-text-store.ts`	Adds a Zustand store to manage STT state, device capabilities, worker lifecycle, model selection, audio extraction, transcription processing, result management, timeline insertion, and SRT export, with robust error handling and UI integration.
Audio Extraction Utilities `apps/web/src/lib/audio-extraction.ts`	Adds utilities for extracting, trimming, concatenating, normalizing, and resampling audio from timeline tracks and media elements, facilitating preparation of audio data for speech-to-text processing.
Transcript Types & Utilities `apps/web/src/types/transcript.ts`	Introduces transcript data structures and utility functions for handling word/chunk timing, transcript creation from Whisper output, and SRT subtitle generation.
Project Store Integration `apps/web/src/stores/project-store.ts`	Ensures speech-to-text store state is reset when loading a new project, maintaining consistency with media and timeline stores.
Storage Service Enhancement `apps/web/src/lib/storage/storage-service.ts`	Adds `canvasSize` and `canvasMode` to project serialization and deserialization, ensuring these properties are persisted with project data.
Dependency Update `apps/web/package.json`	Adds the `@huggingface/transformers` package as a new dependency for speech-to-text functionality.

Sequence Diagram(s)

sequenceDiagram
    participant UI as Captions UI
    participant Store as Speech-to-Text Store
    participant Worker as STT Worker
    participant HF as HuggingFace Transformers

    UI->>Store: User clicks "Generate Subtitles"
    Store->>Worker: Post message with audio data, model, options
    Worker->>HF: Dynamically import transformers & model
    Worker-->>Worker: Initialize ASR pipeline (if needed)
    Worker->>Worker: Transcribe audio with progress callbacks
    Worker-->>Store: Post progress and final transcript
    Store-->>UI: Update status, display results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

In the warren of code, a rabbit hops with glee,
Whispering words from sound, as clever as can be!
Audio stitched and models picked,
Subtitles bloom—transcripts clicked.
With progress bars and SRT delight,
This bunny’s captions now take flight!
🐇✨

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

alamshafil · 2025-08-06T00:25:41Z

@coderabbitai review

coderabbitai · 2025-08-06T00:25:50Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (6)

apps/web/src/lib/audio-extraction.ts (1)

89-90: Remove redundant length check.

The condition audioSegments.length > 0 is unnecessary since we've already verified that audioSegments is not empty at line 82-84.
-  const timelineOffset = audioSegments.length > 0 ? audioSegments[0].startTime : 0;
+  const timelineOffset = audioSegments[0].startTime;

apps/web/public/workers/speech-to-text.worker.js (2)

7-21: Consider adding fallback for CDN failures and retry logic

Loading critical dependencies from CDN introduces a single point of failure. Consider implementing retry logic with exponential backoff and potentially a fallback to a self-hosted version.

Also, the error is being thrown after sending to the main thread (line 19), which might cause the worker to terminate unexpectedly. Consider whether you want to keep the worker alive for potential retry attempts.

-const initTransformers = async () => {
+const initTransformers = async (retryCount = 0, maxRetries = 3) => {
   try {
-    const transformers = await import('https://cdn.jsdelivr.net/npm/@huggingface/[email protected]/+esm');
+    // Add timeout to prevent hanging
+    const controller = new AbortController();
+    const timeoutId = setTimeout(() => controller.abort(), 30000);
+    
+    const transformers = await import('https://cdn.jsdelivr.net/npm/@huggingface/[email protected]/+esm');
+    clearTimeout(timeoutId);
+    
     pipeline = transformers.pipeline;
     WhisperTextStreamer = transformers.WhisperTextStreamer;
   } catch (error) {
     console.error('Failed to import transformers:', error);
+    
+    if (retryCount < maxRetries) {
+      console.log(`Retrying transformers import (${retryCount + 1}/${maxRetries})...`);
+      await new Promise(resolve => setTimeout(resolve, Math.pow(2, retryCount) * 1000));
+      return initTransformers(retryCount + 1, maxRetries);
+    }
+    
     // Send error back to main thread
     self.postMessage({
       status: "error",
       data: { message: "Failed to load AI model dependencies" }
     });
-    throw error;
+    // Don't throw - keep worker alive for potential recovery
   }
 };

167-167: Replace nullish coalescing assignment for broader compatibility

The nullish coalescing assignment operator (??=) might not be supported in all target environments.

-        start_time ??= performance.now();
+        if (start_time === null) {
+          start_time = performance.now();
+        }

apps/web/src/components/editor/media-panel/views/captions.tsx (1)

457-457: Add type annotation for chunk parameter

The chunk parameter in the map function should be properly typed.
-            {result.chunks.map((chunk: any, index: number) => (
+            {result.chunks.map((chunk, index) => (

apps/web/src/stores/speech-to-text-store.ts (2)

16-39: Add proper WebGPU type declarations instead of using 'as any'

Using 'as any' bypasses TypeScript's type checking. Consider adding proper WebGPU type declarations.

+// Add at the top of the file
+interface GPU {
+  requestAdapter(): Promise<GPUAdapter | null>;
+}
+
+interface GPUAdapter {
+  requestDevice(): Promise<GPUDevice>;
+}
+
+interface GPUDevice {
+  destroy(): void;
+}
+
+interface NavigatorGPU {
+  gpu?: GPU;
+}
+
 async function detectWebGPU(): Promise<boolean> {
   try {
-    if (!(navigator as any).gpu) {
+    const nav = navigator as unknown as NavigatorGPU;
+    if (!nav.gpu) {
       return false;
     }

-    const adapter = await (navigator as any).gpu.requestAdapter();
+    const adapter = await nav.gpu.requestAdapter();
     if (!adapter) {
       return false;
     }

745-803: Consider reusing AudioContext for better performance

Creating a new AudioContext for each extraction could be expensive if multiple extractions are performed. Consider reusing a single context or implementing a pool.

+// Add as a module-level variable or store property
+let sharedAudioContext: AudioContext | null = null;
+
+const getAudioContext = (): AudioContext => {
+  if (!sharedAudioContext || sharedAudioContext.state === 'closed') {
+    sharedAudioContext = new (window.AudioContext || (window as any).webkitAudioContext)();
+  }
+  return sharedAudioContext;
+};

   extractAudioFromElement: async (element: any, mediaItem: any): Promise<{ audioData: Float32Array; sampleRate: number }> => {
     if (!mediaItem?.file) {
       throw new Error('No media file found for selected element');
     }

     try {
       // Load the audio file
       const arrayBuffer = await mediaItem.file.arrayBuffer();
-      const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)();
+      const audioContext = getAudioContext();
       
       const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
       
       // ... rest of the extraction logic ...
       
-      audioContext.close();
+      // Don't close the shared context
+      // audioContext.close();
       
       return {
         audioData: resampledData,
         sampleRate: targetSampleRate
       };

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f872485 and 2f3ed98.

📒 Files selected for processing (8)

apps/web/package.json (1 hunks)
apps/web/public/workers/speech-to-text.worker.js (1 hunks)
apps/web/src/components/editor/media-panel/views/captions.tsx (1 hunks)
apps/web/src/lib/audio-extraction.ts (1 hunks)
apps/web/src/lib/storage/storage-service.ts (2 hunks)
apps/web/src/stores/project-store.ts (2 hunks)
apps/web/src/stores/speech-to-text-store.ts (1 hunks)
apps/web/src/types/transcript.ts (1 hunks)

🧰 Additional context used

🧠 Learnings (10)

📓 Common learnings

Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{jsx,tsx} : Include caption tracks for audio and video elements.

Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{jsx,tsx} : Include caption tracks for audio and video elements.

📚 Learning: applies to **/*.{ts,tsx} : use `import type` for types....

Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{ts,tsx} : Use `import type` for types.

Applied to files:

apps/web/src/stores/project-store.ts

📚 Learning: the file apps/web/src/components/editor/media-panel/views/media.tsx uses "use client" directive, mak...

Learnt from: khanguyen74
PR: OpenCut-app/OpenCut#466
File: apps/web/src/components/editor/media-panel/views/media.tsx:47-52
Timestamp: 2025-07-26T21:07:57.582Z
Learning: The file apps/web/src/components/editor/media-panel/views/media.tsx uses "use client" directive, making it client-only code where window object is always available, so SSR safety checks are not needed.

Applied to files:

apps/web/src/stores/project-store.ts
apps/web/src/components/editor/media-panel/views/captions.tsx
apps/web/src/stores/speech-to-text-store.ts

📚 Learning: applies to **/*.{ts,tsx} : use `export type` for types....

Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{ts,tsx} : Use `export type` for types.

Applied to files:

apps/web/src/types/transcript.ts

📚 Learning: applies to **/*.{jsx,tsx} : include caption tracks for audio and video elements....

Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{jsx,tsx} : Include caption tracks for audio and video elements.

Applied to files:

apps/web/src/components/editor/media-panel/views/captions.tsx
apps/web/src/lib/audio-extraction.ts
apps/web/src/stores/speech-to-text-store.ts

📚 Learning: applies to **/*.{jsx,tsx} : give all elements requiring alt text meaningful information for screen r...

Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{jsx,tsx} : Give all elements requiring alt text meaningful information for screen readers.

Applied to files:

apps/web/src/components/editor/media-panel/views/captions.tsx

📚 Learning: applies to **/*.{jsx,tsx} : use semantic elements instead of role attributes in jsx....

Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{jsx,tsx} : Use semantic elements instead of role attributes in JSX.

Applied to files:

apps/web/src/components/editor/media-panel/views/captions.tsx

📚 Learning: applies to **/*.{jsx,tsx} : give heading elements content that's accessible to screen readers (not h...

Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{jsx,tsx} : Give heading elements content that's accessible to screen readers (not hidden with `aria-hidden`).

Applied to files:

apps/web/src/components/editor/media-panel/views/captions.tsx

📚 Learning: applies to **/*.{jsx,tsx} : make sure anchors have content that's accessible to screen readers....

Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{jsx,tsx} : Make sure anchors have content that's accessible to screen readers.

Applied to files:

apps/web/src/components/editor/media-panel/views/captions.tsx

📚 Learning: applies to **/*.{jsx,tsx} : don't assign non-interactive aria roles to interactive html elements....

Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{jsx,tsx} : Don't assign non-interactive ARIA roles to interactive HTML elements.

Applied to files:

apps/web/src/components/editor/media-panel/views/captions.tsx

🧬 Code Graph Analysis (2)

apps/web/src/stores/project-store.ts (1)

apps/web/src/stores/speech-to-text-store.ts (1)

useSpeechToTextStore (164-805)

apps/web/src/lib/audio-extraction.ts (2)

apps/web/src/types/timeline.ts (1)

TimelineTrack (82-89)

apps/web/src/stores/media-store.ts (1)

MediaItem (8-26)

🔇 Additional comments (20)

apps/web/package.json (1)

22-22: LGTM!

The addition of @huggingface/transformers dependency is appropriate for implementing the speech-to-text functionality.

apps/web/src/stores/project-store.ts (2)

7-7: LGTM!

The import is correctly placed and follows the same pattern as other store imports.

190-193: LGTM!

Properly resets speech-to-text state when loading a project, maintaining consistency with other store resets.

apps/web/src/lib/storage/storage-service.ts (2)

77-78: LGTM!

Canvas properties are correctly added to project serialization.

101-102: LGTM!

Canvas properties are correctly restored during project deserialization, maintaining consistency with the serialization format.

apps/web/src/types/transcript.ts (3)

35-46: LGTM!

The time conversion function correctly handles the SRT format with proper padding.

51-64: LGTM!

The SRT generation correctly formats transcript chunks into valid SRT format.

69-142: LGTM!

Comprehensive implementation with robust fallback logic for handling various Whisper output formats.

apps/web/src/lib/audio-extraction.ts (5)

5-6: LGTM!

Correctly uses import type for type imports following project conventions.

101-113: LGTM!

Proper AudioContext lifecycle management with cleanup in both success and error paths.

170-215: LGTM!

Well-implemented audio concatenation with proper mixing and normalization.

220-235: LGTM!

Correct normalization implementation with safe peak level.

240-269: LGTM!

Appropriate linear interpolation resampling for speech recognition preprocessing.
apps/web/public/workers/speech-to-text.worker.js (3)

64-103: LGTM!

The message handler properly handles test connectivity, lazy initialization, and error cases with appropriate error message extraction.

233-248: Add safety checks for chunk array access and optimize filtering

The code accesses the last element of chunks array without checking if it exists after filtering.
     // Format and validate chunks
     const formattedChunks = chunks
       .filter(chunk => chunk.finalised && chunk.text.trim())
       .map(chunk => {
         const startTime = chunk.timestamp[0];
         const endTime = chunk.timestamp[1] || startTime;
         return {
           text: chunk.text.trim(),
           timestamp: [startTime, endTime]
         };
       });
+    
+    // Safely get the last timestamp
+    const lastChunk = formattedChunks[formattedChunks.length - 1];
+    const finalTimestamp = lastChunk ? lastChunk.timestamp[1] : 0;

     return {
       tps,
       text: output.text || '',
       chunks: formattedChunks,
     };
⛔ Skipped due to learnings
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Make sure iterable callbacks return consistent values.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use `Array#{indexOf,lastIndexOf}()` instead of `Array#{findIndex,findLastIndex}()` when looking for the index of an item.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.403Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Don't use sparse arrays (arrays with holes).
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use `String.slice()` instead of `String.substr()` and `String.substring()`.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use `String.trimStart()` and `String.trimEnd()` over `String.trimLeft()` and `String.trimRight()`.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Don't use optional chaining where undefined values aren't allowed.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Make sure "for" loop update clauses move the counter in the right direction.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Don't use spread (`...`) syntax on accumulators.
133-143: Fix validation logic to avoid optional chaining with non-optional property access

The validation uses optional chaining but then accesses nested properties that might not exist, which could throw runtime errors.
-    if (!transcriber?.processor?.feature_extractor?.config) {
+    if (!transcriber || !transcriber.processor || !transcriber.processor.feature_extractor || !transcriber.processor.feature_extractor.config) {
       throw new Error('Invalid transcriber configuration - missing feature extractor');
     }

-    if (!transcriber.model?.config?.max_source_positions) {
+    if (!transcriber.model || !transcriber.model.config || !transcriber.model.config.max_source_positions) {
       throw new Error('Invalid transcriber configuration - missing model config');
     }

     const time_precision = 
       transcriber.processor.feature_extractor.config.chunk_length /
       transcriber.model.config.max_source_positions;
⛔ Skipped due to learnings
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Don't use optional chaining where undefined values aren't allowed.
apps/web/src/components/editor/media-panel/views/captions.tsx (2)

87-327: LGTM! Accessibility implementation follows best practices

The component properly uses semantic HTML elements, provides labels for form controls, and includes tooltips for icon buttons, adhering to the accessibility requirements from the retrieved learnings.

350-353: Add safety check for empty chunks array

Accessing the last element of chunks array without checking if it exists could cause runtime errors.
   const totalDuration = result.chunks.length > 0 
-    ? result.chunks[result.chunks.length - 1]?.timestamp[1] 
+    ? result.chunks[result.chunks.length - 1]?.timestamp?.[1] || 0
     : 0;
⛔ Skipped due to learnings
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Don't use optional chaining where undefined values aren't allowed.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use Date.now() to get milliseconds since the Unix Epoch.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use Date.now() to get milliseconds since the Unix Epoch.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use isNaN() when checking for NaN.
apps/web/src/stores/speech-to-text-store.ts (2)

201-434: LGTM! Robust worker initialization with proper error handling

The worker initialization properly manages promises to prevent race conditions, includes comprehensive error handling, and implements cleanup on failure. Good use of the initialization promise pattern.

511-528: LGTM! Proper state cleanup and resource management

The resetState function properly terminates the worker and resets state while preserving device capabilities and model configuration that don't need to be re-detected.

coderabbitai · 2025-08-06T00:32:14Z

apps/web/public/workers/speech-to-text.worker.js

+  static async getInstance(model, progressCallback) {
+    if (this.instance === null || this.currentModel !== model) {
+      if (this.instance) {
+        try {
+          await this.instance.dispose();
+        } catch (error) {
+          console.warn('Failed to dispose previous model instance:', error);
+        }
+        this.instance = null;
+      }
+
+      this.currentModel = model;
+      this.instance = await pipeline("automatic-speech-recognition", model, {
+        dtype: {
+          encoder_model: model === "onnx-community/whisper-large-v3-turbo" ? "fp16" : "fp32",
+          decoder_model_merged: "q4",
+        },
+        device: "webgpu",
+        progress_callback: progressCallback,
+      });
+    }
+    return this.instance;
+  }


⚠️ Potential issue

Make device configuration dynamic and improve dtype selection

The device is hardcoded to 'webgpu' (line 44) but WebGPU might not be available on all devices. The store detects capabilities but this information isn't passed to the worker. Also, the dtype selection only checks for one specific model.

- static async getInstance(model, progressCallback) { + static async getInstance(model, progressCallback, deviceConfig = { device: 'webgpu', dtype: 'fp32' }) { if (this.instance === null || this.currentModel !== model) { if (this.instance) { try { await this.instance.dispose(); } catch (error) { console.warn('Failed to dispose previous model instance:', error); } this.instance = null; } this.currentModel = model; + + // Determine dtype based on model and device + const getDtype = (model, device) => { + const isLargeModel = model.includes('large') || model.includes('medium'); + const canUseFp16 = device === 'webgpu' && !model.includes('whisper-base'); + + return { + encoder_model: canUseFp16 ? 'fp16' : 'fp32', + decoder_model_merged: isLargeModel ? 'q4' : 'q8' + }; + }; + this.instance = await pipeline("automatic-speech-recognition", model, { - dtype: { - encoder_model: model === "onnx-community/whisper-large-v3-turbo" ? "fp16" : "fp32", - decoder_model_merged: "q4", - }, - device: "webgpu", + dtype: deviceConfig.customDtype || getDtype(model, deviceConfig.device), + device: deviceConfig.device, progress_callback: progressCallback, }); } return this.instance; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

static async getInstance(model, progressCallback) {

if (this.instance === null || this.currentModel !== model) {

if (this.instance) {

try {

await this.instance.dispose();

} catch (error) {

console.warn('Failed to dispose previous model instance:', error);

}

this.instance = null;

}

this.currentModel = model;

this.instance = await pipeline("automatic-speech-recognition", model, {

dtype: {

encoder_model: model === "onnx-community/whisper-large-v3-turbo" ? "fp16" : "fp32",

decoder_model_merged: "q4",

},

device: "webgpu",

progress_callback: progressCallback,

});

}

return this.instance;

}

static async getInstance(model, progressCallback, deviceConfig = { device: 'webgpu', dtype: 'fp32' }) {

if (this.instance === null || this.currentModel !== model) {

if (this.instance) {

try {

await this.instance.dispose();

} catch (error) {

console.warn('Failed to dispose previous model instance:', error);

}

this.instance = null;

}

this.currentModel = model;

// Determine dtype based on model and device

const getDtype = (model, device) => {

const isLargeModel = model.includes('large') || model.includes('medium');

const canUseFp16 = device === 'webgpu' && !model.includes('whisper-base');

return {

encoder_model: canUseFp16 ? 'fp16' : 'fp32',

decoder_model_merged: isLargeModel ? 'q4' : 'q8'

};

};

this.instance = await pipeline("automatic-speech-recognition", model, {

dtype: deviceConfig.customDtype || getDtype(model, deviceConfig.device),

device: deviceConfig.device,

progress_callback: progressCallback,

});

}

return this.instance;

}

🤖 Prompt for AI Agents

In apps/web/public/workers/speech-to-text.worker.js between lines 27 and 49, the device is hardcoded to 'webgpu', which may not be supported on all devices, and dtype selection only checks for one specific model. Modify the getInstance method to accept device capability information as a parameter or retrieve it dynamically, then use this to set the device option instead of hardcoding 'webgpu'. Also, enhance the dtype selection logic to handle multiple models or make it configurable rather than only checking for "onnx-community/whisper-large-v3-turbo".

coderabbitai · 2025-08-06T00:32:15Z

apps/web/public/workers/speech-to-text.worker.js

+    let start_time = null;
+    let num_tokens = 0;
+    let tps;
+


⚠️ Potential issue

Initialize tps variable to avoid potential undefined value

The tps variable is declared without initialization and might remain undefined if no tokens are processed.

let chunk_count = 0; let start_time = null; let num_tokens = 0; - let tps; + let tps = 0;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let start_time = null;

let num_tokens = 0;

let tps;

let chunk_count = 0;

let start_time = null;

let num_tokens = 0;

let tps = 0;

🤖 Prompt for AI Agents

In apps/web/public/workers/speech-to-text.worker.js around lines 151 to 154, the variable tps is declared but not initialized, which can lead to it being undefined if no tokens are processed. Initialize tps to a default value such as 0 to ensure it always has a defined numeric value.

coderabbitai · 2025-08-06T00:32:15Z

apps/web/src/components/editor/media-panel/views/captions.tsx

+                onClick={async () => {
+                  if (!selectedElementInfo) {
+                    return;
+                  }
+
+                  try {
+                    // Always ensure worker is initialized first
+                    if (!isWorkerInitialized) {
+                      await handleInitialize();
+
+                      // Wait for the worker to actually be initialized
+                      // We'll poll the store state until isWorkerInitialized becomes true
+                      let retries = 0;
+                      const maxRetries = 50; // 5 seconds max wait
+                      while (!useSpeechToTextStore.getState().isWorkerInitialized && retries < maxRetries) {
+                        await new Promise(resolve => setTimeout(resolve, 100));
+                        retries++;
+                      }
+
+                      if (!useSpeechToTextStore.getState().isWorkerInitialized) {
+                        console.error('Worker initialization timed out');
+                        return;
+                      }
+                    }
+
+                    // Then process the element
+                    await handleProcess();
+                  } catch (error) {
+                    console.error('Error in generate subtitles:', error);
+                  }
+                }}


🛠️ Refactor suggestion

Extract complex initialization logic and add timeout safeguards

The button's onClick handler contains complex initialization and polling logic with magic numbers. This should be extracted to a separate function for better maintainability.

+ const initializeAndProcess = async () => { + const MAX_RETRIES = 50; + const RETRY_DELAY_MS = 100; + + if (!selectedElementInfo) { + return; + } + + try { + // Always ensure worker is initialized first + if (!isWorkerInitialized) { + await handleInitialize(); + + // Wait for the worker to actually be initialized + let retries = 0; + while (!useSpeechToTextStore.getState().isWorkerInitialized && retries < MAX_RETRIES) { + await new Promise(resolve => setTimeout(resolve, RETRY_DELAY_MS)); + retries++; + } + + if (!useSpeechToTextStore.getState().isWorkerInitialized) { + throw new Error('Worker initialization timed out'); + } + } + + // Then process the element + await handleProcess(); + } catch (error) { + console.error('Error in generate subtitles:', error); + // Consider showing user-friendly error message + } + }; <Button - onClick={async () => { - if (!selectedElementInfo) { - return; - } - - try { - // Always ensure worker is initialized first - if (!isWorkerInitialized) { - await handleInitialize(); - - // Wait for the worker to actually be initialized - // We'll poll the store state until isWorkerInitialized becomes true - let retries = 0; - const maxRetries = 50; // 5 seconds max wait - while (!useSpeechToTextStore.getState().isWorkerInitialized && retries < maxRetries) { - await new Promise(resolve => setTimeout(resolve, 100)); - retries++; - } - - if (!useSpeechToTextStore.getState().isWorkerInitialized) { - console.error('Worker initialization timed out'); - return; - } - } - - // Then process the element - await handleProcess(); - } catch (error) { - console.error('Error in generate subtitles:', error); - } - }} + onClick={initializeAndProcess}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

onClick={async () => {

if (!selectedElementInfo) {

return;

}

try {

// Always ensure worker is initialized first

if (!isWorkerInitialized) {

await handleInitialize();

// Wait for the worker to actually be initialized

// We'll poll the store state until isWorkerInitialized becomes true

let retries = 0;

const maxRetries = 50; // 5 seconds max wait

while (!useSpeechToTextStore.getState().isWorkerInitialized && retries < maxRetries) {

await new Promise(resolve => setTimeout(resolve, 100));

retries++;

}

if (!useSpeechToTextStore.getState().isWorkerInitialized) {

console.error('Worker initialization timed out');

return;

}

}

// Then process the element

await handleProcess();

} catch (error) {

console.error('Error in generate subtitles:', error);

}

}}

// --- extract complex init + retry logic into a helper ---

const initializeAndProcess = async () => {

const MAX_RETRIES = 50;

const RETRY_DELAY_MS = 100;

if (!selectedElementInfo) {

return;

}

try {

// Always ensure worker is initialized first

if (!isWorkerInitialized) {

await handleInitialize();

// Wait (with a cap) for the worker to actually be initialized

let retries = 0;

while (!useSpeechToTextStore.getState().isWorkerInitialized && retries < MAX_RETRIES) {

await new Promise(resolve => setTimeout(resolve, RETRY_DELAY_MS));

retries++;

}

if (!useSpeechToTextStore.getState().isWorkerInitialized) {

throw new Error('Worker initialization timed out');

}

}

// Then process the element

await handleProcess();

} catch (error) {

console.error('Error in generate subtitles:', error);

// TODO: display a user-friendly message in the UI

}

};

<Button

onClick={initializeAndProcess}

/* …other props… */

>

Generate Subtitles

</Button>

🤖 Prompt for AI Agents

In apps/web/src/components/editor/media-panel/views/captions.tsx around lines 172 to 202, the onClick handler has complex worker initialization and polling logic with hardcoded retry counts and delays. Extract this logic into a separate async function that handles worker initialization with a configurable timeout and retry mechanism. Replace the inline code in onClick with a call to this new function to improve readability and maintainability.

coderabbitai · 2025-08-06T00:32:15Z

apps/web/src/components/editor/media-panel/views/captions.tsx

+interface CaptionResultCardProps {
+  result: any; // TranscriptionResult type from store
+  onRemove: () => void;
+  onInsertToTimeline: () => void;
+  onDownloadSRT: () => void;
+  onViewSegments: () => void;
+}


⚠️ Potential issue

Replace 'any' type with proper TypeScript interface

Using 'any' type defeats TypeScript's type safety. Import and use the proper TranscriptionResult type.

+import type { TranscriptionResult } from "@/stores/speech-to-text-store"; + interface CaptionResultCardProps { - result: any; // TranscriptionResult type from store + result: TranscriptionResult; onRemove: () => void; onInsertToTimeline: () => void; onDownloadSRT: () => void; onViewSegments: () => void; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

interface CaptionResultCardProps {

result: any; // TranscriptionResult type from store

onRemove: () => void;

onInsertToTimeline: () => void;

onDownloadSRT: () => void;

onViewSegments: () => void;

}

import type { TranscriptionResult } from "@/stores/speech-to-text-store";

interface CaptionResultCardProps {

result: TranscriptionResult;

onRemove: () => void;

onInsertToTimeline: () => void;

onDownloadSRT: () => void;

onViewSegments: () => void;

}

🤖 Prompt for AI Agents

In apps/web/src/components/editor/media-panel/views/captions.tsx around lines 329 to 335, replace the 'any' type used for the 'result' prop in CaptionResultCardProps with the correct TranscriptionResult interface. Import the TranscriptionResult type from the appropriate store module and update the type annotation to use it instead of 'any' to ensure proper type safety.

coderabbitai · 2025-08-06T00:32:15Z

apps/web/src/components/editor/media-panel/views/captions.tsx

+interface SegmentViewerProps {
+  result: any; // TranscriptionResult
+  onClose: () => void;
+}


⚠️ Potential issue

Use proper TypeScript types instead of 'any'

Replace 'any' types with proper interfaces for better type safety.

+import type { TranscriptionResult } from "@/stores/speech-to-text-store"; + interface SegmentViewerProps { - result: any; // TranscriptionResult + result: TranscriptionResult | undefined; onClose: () => void; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

interface SegmentViewerProps {

result: any; // TranscriptionResult

onClose: () => void;

}

import type { TranscriptionResult } from "@/stores/speech-to-text-store";

interface SegmentViewerProps {

result: TranscriptionResult | undefined;

onClose: () => void;

}

🤖 Prompt for AI Agents

In apps/web/src/components/editor/media-panel/views/captions.tsx around lines 431 to 434, the SegmentViewerProps interface uses 'any' for the 'result' property, which reduces type safety. Replace 'any' with the correct TypeScript interface that represents the transcription result, such as 'TranscriptionResult' or the appropriate type defined elsewhere in the codebase, to ensure proper type checking and improve code reliability.

coderabbitai · 2025-08-06T00:32:15Z

apps/web/src/lib/audio-extraction.ts

+  if (actualDurationSamples <= 0) {
+    // Return silent buffer if no valid audio range
+  const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)();    const silentBuffer = audioContext.createBuffer(
+      audioBuffer.numberOfChannels,
+      1,
+      sampleRate
+    );
+    audioContext.close();
+    return silentBuffer;
+  }


🛠️ Refactor suggestion

⚠️ Potential issue

Fix formatting and optimize AudioContext usage.

There's a formatting issue at line 136 and the function creates multiple AudioContext instances.

if (actualDurationSamples <= 0) { // Return silent buffer if no valid audio range - const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)(); const silentBuffer = audioContext.createBuffer( + const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)(); + const silentBuffer = audioContext.createBuffer( audioBuffer.numberOfChannels, 1, sampleRate ); audioContext.close(); return silentBuffer; }

Also consider creating a single AudioContext at the beginning of the function to avoid multiple instantiations.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (actualDurationSamples <= 0) {

// Return silent buffer if no valid audio range

const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)(); const silentBuffer = audioContext.createBuffer(

audioBuffer.numberOfChannels,

1,

sampleRate

);

audioContext.close();

return silentBuffer;

}

if (actualDurationSamples <= 0) {

// Return silent buffer if no valid audio range

const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)();

const silentBuffer = audioContext.createBuffer(

audioBuffer.numberOfChannels,

1,

sampleRate

);

audioContext.close();

return silentBuffer;

}

🤖 Prompt for AI Agents

In apps/web/src/lib/audio-extraction.ts around lines 134 to 143, fix the formatting issue at line 136 by properly indenting the code and splitting the combined statements into separate lines. Additionally, refactor the function to create a single AudioContext instance at the beginning and reuse it throughout the function to avoid multiple instantiations, ensuring to close the AudioContext appropriately after use.

coderabbitai · 2025-08-06T00:32:15Z

apps/web/src/stores/speech-to-text-store.ts

+  insertResultToTimeline: (resultId: string, mode: 'sentences' | 'words' = 'sentences') => {
+    const state = get();
+    const result = state.results.find(r => r.id === resultId);
+
+    if (!result) {
+      return;
+    }
+
+    const timelineStore = useTimelineStore.getState();
+    const elementInfo = state.getSelectedElementInfo();
+
+    if (!elementInfo) {
+      console.warn('No element selected - cannot determine timeline position');
+      return;
+    }
+
+    // Calculate timing offset based on the element's position in the timeline
+    const element = elementInfo.element;
+
+    // IMPORTANT: The timestamps from transcription are relative to the EXTRACTED audio
+    // which starts from trimStart, not from the beginning of the original file
+    // So we need to add: element.startTime (timeline position)
+    // Note: We DON'T add trimStart because the transcription timestamps are already relative to the extracted portion
+    const timelineOffset = element.startTime;
+
+    // Prepare all text elements in advance
+    const textElements: TextElement[] = [];
+
+    if (mode === 'words') {
+      // Insert individual words as text elements
+      let wordCount = 0;
+      result.transcript.chunks.forEach((chunk) => {
+        chunk.words.forEach((word) => {
+          const startTime = timelineOffset + (word.startTime / 1000); // Convert from ms to seconds
+          const duration = (word.endTime - word.startTime) / 1000; // Convert from ms to seconds
+
+          const textElement: TextElement = {
+            id: crypto.randomUUID(),
+            type: 'text',
+            name: `Word ${wordCount + 1}: ${word.text}`,
+            content: word.text,
+            duration: Math.max(duration, 0.5), // Minimum 0.5 second duration for words
+            startTime: startTime,
+            trimStart: 0,
+            trimEnd: 0,
+            fontSize: 36,
+            fontFamily: 'Arial',
+            color: '#ffffff',
+            backgroundColor: 'rgba(0, 0, 0, 0.7)',
+            textAlign: 'center' as const,
+            fontWeight: 'bold' as const,
+            fontStyle: 'normal' as const,
+            textDecoration: 'none' as const,
+            x: 0,
+            y: 200,
+            rotation: 0,
+            opacity: 1
+          };
+
+          textElements.push(textElement);
+          wordCount++;
+        });
+      });
+
+    } else {
+      // Insert sentence chunks as text elements
+      result.chunks.forEach((chunk, index) => {
+        const duration = chunk.timestamp[1] - chunk.timestamp[0];
+        // Add the timeline offset to get the absolute position
+        const startTime = timelineOffset + chunk.timestamp[0];
+
+        // Create a more descriptive name for the text element
+        const words = chunk.text.trim().split(/\s+/);
+        const shortText = words.length > 3 
+          ? words.slice(0, 3).join(' ') + '...'
+          : chunk.text;
+
+        const textElement: TextElement = {
+          id: crypto.randomUUID(),
+          type: 'text',
+          name: `Subtitle ${index + 1}: ${shortText}`,
+          content: chunk.text.trim(),
+          duration: Math.max(duration, 1), // Ensure minimum 1 second duration
+          startTime: startTime,
+          trimStart: 0,
+          trimEnd: 0,
+          fontSize: 36,
+          fontFamily: 'Arial',
+          color: '#ffffff',
+          backgroundColor: 'rgba(0, 0, 0, 0.7)', // Semi-transparent background for readability
+          textAlign: 'center' as const,
+          fontWeight: 'bold' as const,
+          fontStyle: 'normal' as const,
+          textDecoration: 'none' as const,
+          x: 0, // Center horizontally
+          y: 200, // Position at bottom (positive y moves down from center)
+          rotation: 0,
+          opacity: 1
+        };
+
+        textElements.push(textElement);
+      });
+    }


🛠️ Refactor suggestion

Extract text element styling to constants for better maintainability

The text element creation has many hardcoded style values that should be extracted to constants or configuration.

+// Add at the top of the file or in a separate config +const DEFAULT_TEXT_ELEMENT_STYLE = { + fontSize: 36, + fontFamily: 'Arial', + color: '#ffffff', + backgroundColor: 'rgba(0, 0, 0, 0.7)', + textAlign: 'center' as const, + fontWeight: 'bold' as const, + fontStyle: 'normal' as const, + textDecoration: 'none' as const, + x: 0, + y: 200, + rotation: 0, + opacity: 1 +}; + +const MINIMUM_WORD_DURATION = 0.5; +const MINIMUM_SENTENCE_DURATION = 1.0; if (mode === 'words') { // Insert individual words as text elements let wordCount = 0; result.transcript.chunks.forEach((chunk) => { chunk.words.forEach((word) => { const startTime = timelineOffset + (word.startTime / 1000); const duration = (word.endTime - word.startTime) / 1000; const textElement: TextElement = { id: crypto.randomUUID(), type: 'text', name: `Word ${wordCount + 1}: ${word.text}`, content: word.text, - duration: Math.max(duration, 0.5), + duration: Math.max(duration, MINIMUM_WORD_DURATION), startTime: startTime, trimStart: 0, trimEnd: 0, - fontSize: 36, - fontFamily: 'Arial', - color: '#ffffff', - backgroundColor: 'rgba(0, 0, 0, 0.7)', - textAlign: 'center' as const, - fontWeight: 'bold' as const, - fontStyle: 'normal' as const, - textDecoration: 'none' as const, - x: 0, - y: 200, - rotation: 0, - opacity: 1 + ...DEFAULT_TEXT_ELEMENT_STYLE };

🤖 Prompt for AI Agents

In apps/web/src/stores/speech-to-text-store.ts around lines 536 to 638, the text element creation uses many hardcoded style values such as fontSize, fontFamily, color, backgroundColor, textAlign, fontWeight, fontStyle, textDecoration, x, y, rotation, and opacity. To improve maintainability, extract these repeated style properties into constants or a configuration object at the top of the file or in a separate styles module. Then, reference these constants when creating text elements instead of hardcoding the values directly.

coderabbitai · 2025-08-06T00:32:15Z

apps/web/src/types/transcript.ts

+export interface TranscriptWord {
+  text: string;
+  startTime: number; // in milliseconds
+  endTime: number; // in milliseconds
+}
+
+export interface TranscriptChunk {
+  words: TranscriptWord[];
+  startTime: number; // in milliseconds
+  endTime: number; // in milliseconds
+  text: string;
+}
+
+export interface Transcript {
+  id: string;
+  chunks: TranscriptChunk[];
+  language: string;
+  totalDuration: number; // in milliseconds
+}
+
+export interface SRTSegment {
+  index: number;
+  startTime: string; // SRT format: "00:00:01,000"
+  endTime: string; // SRT format: "00:00:02,000"
+  text: string;
+}


🛠️ Refactor suggestion

Use export type for type exports.

According to the project's TypeScript conventions, types should be exported using export type.

Apply this diff to follow the project's type export convention:

-export interface TranscriptWord { +export type TranscriptWord = { text: string; startTime: number; // in milliseconds endTime: number; // in milliseconds -} +}; -export interface TranscriptChunk { +export type TranscriptChunk = { words: TranscriptWord[]; startTime: number; // in milliseconds endTime: number; // in milliseconds text: string; -} +}; -export interface Transcript { +export type Transcript = { id: string; chunks: TranscriptChunk[]; language: string; totalDuration: number; // in milliseconds -} +}; -export interface SRTSegment { +export type SRTSegment = { index: number; startTime: string; // SRT format: "00:00:01,000" endTime: string; // SRT format: "00:00:02,000" text: string; -} +};

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export interface TranscriptWord {

text: string;

startTime: number; // in milliseconds

endTime: number; // in milliseconds

}

export interface TranscriptChunk {

words: TranscriptWord[];

startTime: number; // in milliseconds

endTime: number; // in milliseconds

text: string;

}

export interface Transcript {

id: string;

chunks: TranscriptChunk[];

language: string;

totalDuration: number; // in milliseconds

}

export interface SRTSegment {

index: number;

startTime: string; // SRT format: "00:00:01,000"

endTime: string; // SRT format: "00:00:02,000"

text: string;

}

export type TranscriptWord = {

text: string;

startTime: number; // in milliseconds

endTime: number; // in milliseconds

};

export type TranscriptChunk = {

words: TranscriptWord[];

startTime: number; // in milliseconds

endTime: number; // in milliseconds

text: string;

};

export type Transcript = {

id: string;

chunks: TranscriptChunk[];

language: string;

totalDuration: number; // in milliseconds

};

export type SRTSegment = {

index: number;

startTime: string; // SRT format: "00:00:01,000"

endTime: string; // SRT format: "00:00:02,000"

text: string;

};

🤖 Prompt for AI Agents

In apps/web/src/types/transcript.ts from lines 5 to 30, the interfaces are exported using `export interface`. To follow the project's TypeScript conventions, change all `export interface` declarations to `export type` declarations while keeping the structure intact. This means converting each interface to a type alias with the same shape and exporting it using `export type`.

feat: Implement speech-to-text functionality

2f3ed98

- Introduce a new worker `speech-to-text.worker.ts` for handling speech recognition using Hugging Face's Whisper model.

coderabbitai bot reviewed Aug 6, 2025

View reviewed changes

alamshafil mentioned this pull request Aug 6, 2025

[FEATURE] Add Auto Captions Feature Like Clipchamp #527

Open

[draft] feat: Implement speech-to-text functionality #526

Are you sure you want to change the base?

[draft] feat: Implement speech-to-text functionality #526

Uh oh!

Conversation

alamshafil commented Aug 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Screenshots (if applicable)

Checklist:

Additional context

Summary by CodeRabbit

Uh oh!

vercel bot commented Aug 6, 2025

Uh oh!

coderabbitai bot commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

alamshafil commented Aug 6, 2025

Uh oh!

coderabbitai bot commented Aug 6, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamshafil commented Aug 6, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 6, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)