Skip to content

Conversation

alamshafil
Copy link

@alamshafil alamshafil commented Aug 6, 2025

Description

This PR adds speech-to-text (STT) support, enabling transcription of audio tracks using HuggingFace Whisper models. Uses @huggingface/transformers in a web worker. Additional code was required to extract audio tracks from video.

Type of change

  • New feature (non-breaking change which adds functionality)

Screenshots (if applicable)

image

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have added screenshots if ui has been changed
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

Additional context

Opening as Draft PR.

Summary by CodeRabbit

  • New Features

    • Introduced comprehensive speech-to-text transcription capabilities, including model selection, device capability detection, and progress reporting.
    • Added a full-featured captions panel for generating, viewing, and managing subtitles from audio or video elements, with options to insert captions, download SRT files, and view detailed segments.
    • Enabled extraction and processing of audio from media tracks for transcription.
    • Provided utilities for transcript data handling and SRT subtitle generation.
  • Chores

    • Updated dependencies to include Hugging Face Transformers for speech recognition.
    • Enhanced project save/load functionality to include canvas size and mode.
    • Improved state management to reset speech-to-text state when switching projects.

- Introduce a new worker `speech-to-text.worker.ts` for handling speech recognition using Hugging Face's Whisper model.
Copy link

vercel bot commented Aug 6, 2025

@alamshafil is attempting to deploy a commit to the OpenCut OSS Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

coderabbitai bot commented Aug 6, 2025

Walkthrough

This update introduces a comprehensive speech-to-text (STT) transcription pipeline to the web application. It adds a new web worker for transcription using Hugging Face's Whisper models, a Zustand store for managing STT state and results, a full-featured captions UI, audio extraction utilities, and transcript data structures. Supporting changes ensure proper state management and persistence.

Changes

Cohort / File(s) Change Summary
Speech-to-Text Web Worker
apps/web/public/workers/speech-to-text.worker.js
Introduces a new web worker that dynamically loads Hugging Face's transformers library, manages ASR pipeline instances, and processes transcription requests with progress reporting, error handling, and streaming output for speech-to-text tasks.
Captions UI Component
apps/web/src/components/editor/media-panel/views/captions.tsx
Completely rewrites the captions component to provide a full-featured UI for speech-to-text caption generation, model selection, progress/status display, error handling, and result management, including segment viewing, timeline insertion, and SRT download.
Speech-to-Text Store
apps/web/src/stores/speech-to-text-store.ts
Adds a Zustand store to manage STT state, device capabilities, worker lifecycle, model selection, audio extraction, transcription processing, result management, timeline insertion, and SRT export, with robust error handling and UI integration.
Audio Extraction Utilities
apps/web/src/lib/audio-extraction.ts
Adds utilities for extracting, trimming, concatenating, normalizing, and resampling audio from timeline tracks and media elements, facilitating preparation of audio data for speech-to-text processing.
Transcript Types & Utilities
apps/web/src/types/transcript.ts
Introduces transcript data structures and utility functions for handling word/chunk timing, transcript creation from Whisper output, and SRT subtitle generation.
Project Store Integration
apps/web/src/stores/project-store.ts
Ensures speech-to-text store state is reset when loading a new project, maintaining consistency with media and timeline stores.
Storage Service Enhancement
apps/web/src/lib/storage/storage-service.ts
Adds canvasSize and canvasMode to project serialization and deserialization, ensuring these properties are persisted with project data.
Dependency Update
apps/web/package.json
Adds the @huggingface/transformers package as a new dependency for speech-to-text functionality.

Sequence Diagram(s)

sequenceDiagram
    participant UI as Captions UI
    participant Store as Speech-to-Text Store
    participant Worker as STT Worker
    participant HF as HuggingFace Transformers

    UI->>Store: User clicks "Generate Subtitles"
    Store->>Worker: Post message with audio data, model, options
    Worker->>HF: Dynamically import transformers & model
    Worker-->>Worker: Initialize ASR pipeline (if needed)
    Worker->>Worker: Transcribe audio with progress callbacks
    Worker-->>Store: Post progress and final transcript
    Store-->>UI: Update status, display results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

In the warren of code, a rabbit hops with glee,
Whispering words from sound, as clever as can be!
Audio stitched and models picked,
Subtitles bloom—transcripts clicked.
With progress bars and SRT delight,
This bunny’s captions now take flight!
🐇✨

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@alamshafil
Copy link
Author

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Aug 6, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (6)
apps/web/src/lib/audio-extraction.ts (1)

89-90: Remove redundant length check.

The condition audioSegments.length > 0 is unnecessary since we've already verified that audioSegments is not empty at line 82-84.

-  const timelineOffset = audioSegments.length > 0 ? audioSegments[0].startTime : 0;
+  const timelineOffset = audioSegments[0].startTime;
apps/web/public/workers/speech-to-text.worker.js (2)

7-21: Consider adding fallback for CDN failures and retry logic

Loading critical dependencies from CDN introduces a single point of failure. Consider implementing retry logic with exponential backoff and potentially a fallback to a self-hosted version.

Also, the error is being thrown after sending to the main thread (line 19), which might cause the worker to terminate unexpectedly. Consider whether you want to keep the worker alive for potential retry attempts.

-const initTransformers = async () => {
+const initTransformers = async (retryCount = 0, maxRetries = 3) => {
   try {
-    const transformers = await import('https://cdn.jsdelivr.net/npm/@huggingface/[email protected]/+esm');
+    // Add timeout to prevent hanging
+    const controller = new AbortController();
+    const timeoutId = setTimeout(() => controller.abort(), 30000);
+    
+    const transformers = await import('https://cdn.jsdelivr.net/npm/@huggingface/[email protected]/+esm');
+    clearTimeout(timeoutId);
+    
     pipeline = transformers.pipeline;
     WhisperTextStreamer = transformers.WhisperTextStreamer;
   } catch (error) {
     console.error('Failed to import transformers:', error);
+    
+    if (retryCount < maxRetries) {
+      console.log(`Retrying transformers import (${retryCount + 1}/${maxRetries})...`);
+      await new Promise(resolve => setTimeout(resolve, Math.pow(2, retryCount) * 1000));
+      return initTransformers(retryCount + 1, maxRetries);
+    }
+    
     // Send error back to main thread
     self.postMessage({
       status: "error",
       data: { message: "Failed to load AI model dependencies" }
     });
-    throw error;
+    // Don't throw - keep worker alive for potential recovery
   }
 };

167-167: Replace nullish coalescing assignment for broader compatibility

The nullish coalescing assignment operator (??=) might not be supported in all target environments.

-        start_time ??= performance.now();
+        if (start_time === null) {
+          start_time = performance.now();
+        }
apps/web/src/components/editor/media-panel/views/captions.tsx (1)

457-457: Add type annotation for chunk parameter

The chunk parameter in the map function should be properly typed.

-            {result.chunks.map((chunk: any, index: number) => (
+            {result.chunks.map((chunk, index) => (
apps/web/src/stores/speech-to-text-store.ts (2)

16-39: Add proper WebGPU type declarations instead of using 'as any'

Using 'as any' bypasses TypeScript's type checking. Consider adding proper WebGPU type declarations.

+// Add at the top of the file
+interface GPU {
+  requestAdapter(): Promise<GPUAdapter | null>;
+}
+
+interface GPUAdapter {
+  requestDevice(): Promise<GPUDevice>;
+}
+
+interface GPUDevice {
+  destroy(): void;
+}
+
+interface NavigatorGPU {
+  gpu?: GPU;
+}
+
 async function detectWebGPU(): Promise<boolean> {
   try {
-    if (!(navigator as any).gpu) {
+    const nav = navigator as unknown as NavigatorGPU;
+    if (!nav.gpu) {
       return false;
     }

-    const adapter = await (navigator as any).gpu.requestAdapter();
+    const adapter = await nav.gpu.requestAdapter();
     if (!adapter) {
       return false;
     }

745-803: Consider reusing AudioContext for better performance

Creating a new AudioContext for each extraction could be expensive if multiple extractions are performed. Consider reusing a single context or implementing a pool.

+// Add as a module-level variable or store property
+let sharedAudioContext: AudioContext | null = null;
+
+const getAudioContext = (): AudioContext => {
+  if (!sharedAudioContext || sharedAudioContext.state === 'closed') {
+    sharedAudioContext = new (window.AudioContext || (window as any).webkitAudioContext)();
+  }
+  return sharedAudioContext;
+};

   extractAudioFromElement: async (element: any, mediaItem: any): Promise<{ audioData: Float32Array; sampleRate: number }> => {
     if (!mediaItem?.file) {
       throw new Error('No media file found for selected element');
     }

     try {
       // Load the audio file
       const arrayBuffer = await mediaItem.file.arrayBuffer();
-      const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)();
+      const audioContext = getAudioContext();
       
       const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
       
       // ... rest of the extraction logic ...
       
-      audioContext.close();
+      // Don't close the shared context
+      // audioContext.close();
       
       return {
         audioData: resampledData,
         sampleRate: targetSampleRate
       };
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f872485 and 2f3ed98.

📒 Files selected for processing (8)
  • apps/web/package.json (1 hunks)
  • apps/web/public/workers/speech-to-text.worker.js (1 hunks)
  • apps/web/src/components/editor/media-panel/views/captions.tsx (1 hunks)
  • apps/web/src/lib/audio-extraction.ts (1 hunks)
  • apps/web/src/lib/storage/storage-service.ts (2 hunks)
  • apps/web/src/stores/project-store.ts (2 hunks)
  • apps/web/src/stores/speech-to-text-store.ts (1 hunks)
  • apps/web/src/types/transcript.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{jsx,tsx} : Include caption tracks for audio and video elements.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{jsx,tsx} : Include caption tracks for audio and video elements.
📚 Learning: applies to **/*.{ts,tsx} : use `import type` for types....
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{ts,tsx} : Use `import type` for types.

Applied to files:

  • apps/web/src/stores/project-store.ts
📚 Learning: the file apps/web/src/components/editor/media-panel/views/media.tsx uses "use client" directive, mak...
Learnt from: khanguyen74
PR: OpenCut-app/OpenCut#466
File: apps/web/src/components/editor/media-panel/views/media.tsx:47-52
Timestamp: 2025-07-26T21:07:57.582Z
Learning: The file apps/web/src/components/editor/media-panel/views/media.tsx uses "use client" directive, making it client-only code where window object is always available, so SSR safety checks are not needed.

Applied to files:

  • apps/web/src/stores/project-store.ts
  • apps/web/src/components/editor/media-panel/views/captions.tsx
  • apps/web/src/stores/speech-to-text-store.ts
📚 Learning: applies to **/*.{ts,tsx} : use `export type` for types....
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{ts,tsx} : Use `export type` for types.

Applied to files:

  • apps/web/src/types/transcript.ts
📚 Learning: applies to **/*.{jsx,tsx} : include caption tracks for audio and video elements....
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{jsx,tsx} : Include caption tracks for audio and video elements.

Applied to files:

  • apps/web/src/components/editor/media-panel/views/captions.tsx
  • apps/web/src/lib/audio-extraction.ts
  • apps/web/src/stores/speech-to-text-store.ts
📚 Learning: applies to **/*.{jsx,tsx} : give all elements requiring alt text meaningful information for screen r...
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{jsx,tsx} : Give all elements requiring alt text meaningful information for screen readers.

Applied to files:

  • apps/web/src/components/editor/media-panel/views/captions.tsx
📚 Learning: applies to **/*.{jsx,tsx} : use semantic elements instead of role attributes in jsx....
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{jsx,tsx} : Use semantic elements instead of role attributes in JSX.

Applied to files:

  • apps/web/src/components/editor/media-panel/views/captions.tsx
📚 Learning: applies to **/*.{jsx,tsx} : give heading elements content that's accessible to screen readers (not h...
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{jsx,tsx} : Give heading elements content that's accessible to screen readers (not hidden with `aria-hidden`).

Applied to files:

  • apps/web/src/components/editor/media-panel/views/captions.tsx
📚 Learning: applies to **/*.{jsx,tsx} : make sure anchors have content that's accessible to screen readers....
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{jsx,tsx} : Make sure anchors have content that's accessible to screen readers.

Applied to files:

  • apps/web/src/components/editor/media-panel/views/captions.tsx
📚 Learning: applies to **/*.{jsx,tsx} : don't assign non-interactive aria roles to interactive html elements....
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{jsx,tsx} : Don't assign non-interactive ARIA roles to interactive HTML elements.

Applied to files:

  • apps/web/src/components/editor/media-panel/views/captions.tsx
🧬 Code Graph Analysis (2)
apps/web/src/stores/project-store.ts (1)
apps/web/src/stores/speech-to-text-store.ts (1)
  • useSpeechToTextStore (164-805)
apps/web/src/lib/audio-extraction.ts (2)
apps/web/src/types/timeline.ts (1)
  • TimelineTrack (82-89)
apps/web/src/stores/media-store.ts (1)
  • MediaItem (8-26)
🔇 Additional comments (20)
apps/web/package.json (1)

22-22: LGTM!

The addition of @huggingface/transformers dependency is appropriate for implementing the speech-to-text functionality.

apps/web/src/stores/project-store.ts (2)

7-7: LGTM!

The import is correctly placed and follows the same pattern as other store imports.


190-193: LGTM!

Properly resets speech-to-text state when loading a project, maintaining consistency with other store resets.

apps/web/src/lib/storage/storage-service.ts (2)

77-78: LGTM!

Canvas properties are correctly added to project serialization.


101-102: LGTM!

Canvas properties are correctly restored during project deserialization, maintaining consistency with the serialization format.

apps/web/src/types/transcript.ts (3)

35-46: LGTM!

The time conversion function correctly handles the SRT format with proper padding.


51-64: LGTM!

The SRT generation correctly formats transcript chunks into valid SRT format.


69-142: LGTM!

Comprehensive implementation with robust fallback logic for handling various Whisper output formats.

apps/web/src/lib/audio-extraction.ts (5)

5-6: LGTM!

Correctly uses import type for type imports following project conventions.


101-113: LGTM!

Proper AudioContext lifecycle management with cleanup in both success and error paths.


170-215: LGTM!

Well-implemented audio concatenation with proper mixing and normalization.


220-235: LGTM!

Correct normalization implementation with safe peak level.


240-269: LGTM!

Appropriate linear interpolation resampling for speech recognition preprocessing.

apps/web/public/workers/speech-to-text.worker.js (3)

64-103: LGTM!

The message handler properly handles test connectivity, lazy initialization, and error cases with appropriate error message extraction.


233-248: Add safety checks for chunk array access and optimize filtering

The code accesses the last element of chunks array without checking if it exists after filtering.

     // Format and validate chunks
     const formattedChunks = chunks
       .filter(chunk => chunk.finalised && chunk.text.trim())
       .map(chunk => {
         const startTime = chunk.timestamp[0];
         const endTime = chunk.timestamp[1] || startTime;
         return {
           text: chunk.text.trim(),
           timestamp: [startTime, endTime]
         };
       });
+    
+    // Safely get the last timestamp
+    const lastChunk = formattedChunks[formattedChunks.length - 1];
+    const finalTimestamp = lastChunk ? lastChunk.timestamp[1] : 0;

     return {
       tps,
       text: output.text || '',
       chunks: formattedChunks,
     };
⛔ Skipped due to learnings
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Make sure iterable callbacks return consistent values.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use `Array#{indexOf,lastIndexOf}()` instead of `Array#{findIndex,findLastIndex}()` when looking for the index of an item.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.403Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Don't use sparse arrays (arrays with holes).
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use `String.slice()` instead of `String.substr()` and `String.substring()`.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use `String.trimStart()` and `String.trimEnd()` over `String.trimLeft()` and `String.trimRight()`.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Don't use optional chaining where undefined values aren't allowed.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Make sure "for" loop update clauses move the counter in the right direction.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Don't use spread (`...`) syntax on accumulators.

133-143: Fix validation logic to avoid optional chaining with non-optional property access

The validation uses optional chaining but then accesses nested properties that might not exist, which could throw runtime errors.

-    if (!transcriber?.processor?.feature_extractor?.config) {
+    if (!transcriber || !transcriber.processor || !transcriber.processor.feature_extractor || !transcriber.processor.feature_extractor.config) {
       throw new Error('Invalid transcriber configuration - missing feature extractor');
     }

-    if (!transcriber.model?.config?.max_source_positions) {
+    if (!transcriber.model || !transcriber.model.config || !transcriber.model.config.max_source_positions) {
       throw new Error('Invalid transcriber configuration - missing model config');
     }

     const time_precision = 
       transcriber.processor.feature_extractor.config.chunk_length /
       transcriber.model.config.max_source_positions;
⛔ Skipped due to learnings
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Don't use optional chaining where undefined values aren't allowed.
apps/web/src/components/editor/media-panel/views/captions.tsx (2)

87-327: LGTM! Accessibility implementation follows best practices

The component properly uses semantic HTML elements, provides labels for form controls, and includes tooltips for icon buttons, adhering to the accessibility requirements from the retrieved learnings.


350-353: Add safety check for empty chunks array

Accessing the last element of chunks array without checking if it exists could cause runtime errors.

   const totalDuration = result.chunks.length > 0 
-    ? result.chunks[result.chunks.length - 1]?.timestamp[1] 
+    ? result.chunks[result.chunks.length - 1]?.timestamp?.[1] || 0
     : 0;
⛔ Skipped due to learnings
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use concise optional chaining instead of chained logical expressions.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Don't use optional chaining where undefined values aren't allowed.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use Date.now() to get milliseconds since the Unix Epoch.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .cursor/rules/ultracite.mdc:0-0
Timestamp: 2025-07-27T22:15:27.748Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use Date.now() to get milliseconds since the Unix Epoch.
Learnt from: CR
PR: OpenCut-app/OpenCut#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-27T22:14:46.402Z
Learning: Applies to **/*.{js,jsx,ts,tsx} : Use isNaN() when checking for NaN.
apps/web/src/stores/speech-to-text-store.ts (2)

201-434: LGTM! Robust worker initialization with proper error handling

The worker initialization properly manages promises to prevent race conditions, includes comprehensive error handling, and implements cleanup on failure. Good use of the initialization promise pattern.


511-528: LGTM! Proper state cleanup and resource management

The resetState function properly terminates the worker and resets state while preserving device capabilities and model configuration that don't need to be re-detected.

Comment on lines +27 to +49
static async getInstance(model, progressCallback) {
if (this.instance === null || this.currentModel !== model) {
if (this.instance) {
try {
await this.instance.dispose();
} catch (error) {
console.warn('Failed to dispose previous model instance:', error);
}
this.instance = null;
}

this.currentModel = model;
this.instance = await pipeline("automatic-speech-recognition", model, {
dtype: {
encoder_model: model === "onnx-community/whisper-large-v3-turbo" ? "fp16" : "fp32",
decoder_model_merged: "q4",
},
device: "webgpu",
progress_callback: progressCallback,
});
}
return this.instance;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Make device configuration dynamic and improve dtype selection

The device is hardcoded to 'webgpu' (line 44) but WebGPU might not be available on all devices. The store detects capabilities but this information isn't passed to the worker. Also, the dtype selection only checks for one specific model.

-  static async getInstance(model, progressCallback) {
+  static async getInstance(model, progressCallback, deviceConfig = { device: 'webgpu', dtype: 'fp32' }) {
     if (this.instance === null || this.currentModel !== model) {
       if (this.instance) {
         try {
           await this.instance.dispose();
         } catch (error) {
           console.warn('Failed to dispose previous model instance:', error);
         }
         this.instance = null;
       }

       this.currentModel = model;
+      
+      // Determine dtype based on model and device
+      const getDtype = (model, device) => {
+        const isLargeModel = model.includes('large') || model.includes('medium');
+        const canUseFp16 = device === 'webgpu' && !model.includes('whisper-base');
+        
+        return {
+          encoder_model: canUseFp16 ? 'fp16' : 'fp32',
+          decoder_model_merged: isLargeModel ? 'q4' : 'q8'
+        };
+      };
+      
       this.instance = await pipeline("automatic-speech-recognition", model, {
-        dtype: {
-          encoder_model: model === "onnx-community/whisper-large-v3-turbo" ? "fp16" : "fp32",
-          decoder_model_merged: "q4",
-        },
-        device: "webgpu",
+        dtype: deviceConfig.customDtype || getDtype(model, deviceConfig.device),
+        device: deviceConfig.device,
         progress_callback: progressCallback,
       });
     }
     return this.instance;
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
static async getInstance(model, progressCallback) {
if (this.instance === null || this.currentModel !== model) {
if (this.instance) {
try {
await this.instance.dispose();
} catch (error) {
console.warn('Failed to dispose previous model instance:', error);
}
this.instance = null;
}
this.currentModel = model;
this.instance = await pipeline("automatic-speech-recognition", model, {
dtype: {
encoder_model: model === "onnx-community/whisper-large-v3-turbo" ? "fp16" : "fp32",
decoder_model_merged: "q4",
},
device: "webgpu",
progress_callback: progressCallback,
});
}
return this.instance;
}
static async getInstance(model, progressCallback, deviceConfig = { device: 'webgpu', dtype: 'fp32' }) {
if (this.instance === null || this.currentModel !== model) {
if (this.instance) {
try {
await this.instance.dispose();
} catch (error) {
console.warn('Failed to dispose previous model instance:', error);
}
this.instance = null;
}
this.currentModel = model;
// Determine dtype based on model and device
const getDtype = (model, device) => {
const isLargeModel = model.includes('large') || model.includes('medium');
const canUseFp16 = device === 'webgpu' && !model.includes('whisper-base');
return {
encoder_model: canUseFp16 ? 'fp16' : 'fp32',
decoder_model_merged: isLargeModel ? 'q4' : 'q8'
};
};
this.instance = await pipeline("automatic-speech-recognition", model, {
dtype: deviceConfig.customDtype || getDtype(model, deviceConfig.device),
device: deviceConfig.device,
progress_callback: progressCallback,
});
}
return this.instance;
}
🤖 Prompt for AI Agents
In apps/web/public/workers/speech-to-text.worker.js between lines 27 and 49, the
device is hardcoded to 'webgpu', which may not be supported on all devices, and
dtype selection only checks for one specific model. Modify the getInstance
method to accept device capability information as a parameter or retrieve it
dynamically, then use this to set the device option instead of hardcoding
'webgpu'. Also, enhance the dtype selection logic to handle multiple models or
make it configurable rather than only checking for
"onnx-community/whisper-large-v3-turbo".

Comment on lines +151 to +154
let start_time = null;
let num_tokens = 0;
let tps;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Initialize tps variable to avoid potential undefined value

The tps variable is declared without initialization and might remain undefined if no tokens are processed.

     let chunk_count = 0;
     let start_time = null;
     let num_tokens = 0;
-    let tps;
+    let tps = 0;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let start_time = null;
let num_tokens = 0;
let tps;
let chunk_count = 0;
let start_time = null;
let num_tokens = 0;
let tps = 0;
🤖 Prompt for AI Agents
In apps/web/public/workers/speech-to-text.worker.js around lines 151 to 154, the
variable tps is declared but not initialized, which can lead to it being
undefined if no tokens are processed. Initialize tps to a default value such as
0 to ensure it always has a defined numeric value.

Comment on lines +172 to +202
onClick={async () => {
if (!selectedElementInfo) {
return;
}

try {
// Always ensure worker is initialized first
if (!isWorkerInitialized) {
await handleInitialize();

// Wait for the worker to actually be initialized
// We'll poll the store state until isWorkerInitialized becomes true
let retries = 0;
const maxRetries = 50; // 5 seconds max wait
while (!useSpeechToTextStore.getState().isWorkerInitialized && retries < maxRetries) {
await new Promise(resolve => setTimeout(resolve, 100));
retries++;
}

if (!useSpeechToTextStore.getState().isWorkerInitialized) {
console.error('Worker initialization timed out');
return;
}
}

// Then process the element
await handleProcess();
} catch (error) {
console.error('Error in generate subtitles:', error);
}
}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Extract complex initialization logic and add timeout safeguards

The button's onClick handler contains complex initialization and polling logic with magic numbers. This should be extracted to a separate function for better maintainability.

+  const initializeAndProcess = async () => {
+    const MAX_RETRIES = 50;
+    const RETRY_DELAY_MS = 100;
+    
+    if (!selectedElementInfo) {
+      return;
+    }
+    
+    try {
+      // Always ensure worker is initialized first
+      if (!isWorkerInitialized) {
+        await handleInitialize();
+        
+        // Wait for the worker to actually be initialized
+        let retries = 0;
+        while (!useSpeechToTextStore.getState().isWorkerInitialized && retries < MAX_RETRIES) {
+          await new Promise(resolve => setTimeout(resolve, RETRY_DELAY_MS));
+          retries++;
+        }
+        
+        if (!useSpeechToTextStore.getState().isWorkerInitialized) {
+          throw new Error('Worker initialization timed out');
+        }
+      }
+      
+      // Then process the element
+      await handleProcess();
+    } catch (error) {
+      console.error('Error in generate subtitles:', error);
+      // Consider showing user-friendly error message
+    }
+  };

   <Button 
-    onClick={async () => {
-      if (!selectedElementInfo) {
-        return;
-      }
-      
-      try {
-        // Always ensure worker is initialized first
-        if (!isWorkerInitialized) {
-          await handleInitialize();
-          
-          // Wait for the worker to actually be initialized
-          // We'll poll the store state until isWorkerInitialized becomes true
-          let retries = 0;
-          const maxRetries = 50; // 5 seconds max wait
-          while (!useSpeechToTextStore.getState().isWorkerInitialized && retries < maxRetries) {
-            await new Promise(resolve => setTimeout(resolve, 100));
-            retries++;
-          }
-          
-          if (!useSpeechToTextStore.getState().isWorkerInitialized) {
-            console.error('Worker initialization timed out');
-            return;
-          }
-        }
-        
-        // Then process the element
-        await handleProcess();
-      } catch (error) {
-        console.error('Error in generate subtitles:', error);
-      }
-    }}
+    onClick={initializeAndProcess}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
onClick={async () => {
if (!selectedElementInfo) {
return;
}
try {
// Always ensure worker is initialized first
if (!isWorkerInitialized) {
await handleInitialize();
// Wait for the worker to actually be initialized
// We'll poll the store state until isWorkerInitialized becomes true
let retries = 0;
const maxRetries = 50; // 5 seconds max wait
while (!useSpeechToTextStore.getState().isWorkerInitialized && retries < maxRetries) {
await new Promise(resolve => setTimeout(resolve, 100));
retries++;
}
if (!useSpeechToTextStore.getState().isWorkerInitialized) {
console.error('Worker initialization timed out');
return;
}
}
// Then process the element
await handleProcess();
} catch (error) {
console.error('Error in generate subtitles:', error);
}
}}
// --- extract complex init + retry logic into a helper ---
const initializeAndProcess = async () => {
const MAX_RETRIES = 50;
const RETRY_DELAY_MS = 100;
if (!selectedElementInfo) {
return;
}
try {
// Always ensure worker is initialized first
if (!isWorkerInitialized) {
await handleInitialize();
// Wait (with a cap) for the worker to actually be initialized
let retries = 0;
while (!useSpeechToTextStore.getState().isWorkerInitialized && retries < MAX_RETRIES) {
await new Promise(resolve => setTimeout(resolve, RETRY_DELAY_MS));
retries++;
}
if (!useSpeechToTextStore.getState().isWorkerInitialized) {
throw new Error('Worker initialization timed out');
}
}
// Then process the element
await handleProcess();
} catch (error) {
console.error('Error in generate subtitles:', error);
// TODO: display a user-friendly message in the UI
}
};
<Button
onClick={initializeAndProcess}
/* …other props… */
>
Generate Subtitles
</Button>
🤖 Prompt for AI Agents
In apps/web/src/components/editor/media-panel/views/captions.tsx around lines
172 to 202, the onClick handler has complex worker initialization and polling
logic with hardcoded retry counts and delays. Extract this logic into a separate
async function that handles worker initialization with a configurable timeout
and retry mechanism. Replace the inline code in onClick with a call to this new
function to improve readability and maintainability.

Comment on lines +329 to +335
interface CaptionResultCardProps {
result: any; // TranscriptionResult type from store
onRemove: () => void;
onInsertToTimeline: () => void;
onDownloadSRT: () => void;
onViewSegments: () => void;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Replace 'any' type with proper TypeScript interface

Using 'any' type defeats TypeScript's type safety. Import and use the proper TranscriptionResult type.

+import type { TranscriptionResult } from "@/stores/speech-to-text-store";
+
 interface CaptionResultCardProps {
-  result: any; // TranscriptionResult type from store
+  result: TranscriptionResult;
   onRemove: () => void;
   onInsertToTimeline: () => void;
   onDownloadSRT: () => void;
   onViewSegments: () => void;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
interface CaptionResultCardProps {
result: any; // TranscriptionResult type from store
onRemove: () => void;
onInsertToTimeline: () => void;
onDownloadSRT: () => void;
onViewSegments: () => void;
}
import type { TranscriptionResult } from "@/stores/speech-to-text-store";
interface CaptionResultCardProps {
result: TranscriptionResult;
onRemove: () => void;
onInsertToTimeline: () => void;
onDownloadSRT: () => void;
onViewSegments: () => void;
}
🤖 Prompt for AI Agents
In apps/web/src/components/editor/media-panel/views/captions.tsx around lines
329 to 335, replace the 'any' type used for the 'result' prop in
CaptionResultCardProps with the correct TranscriptionResult interface. Import
the TranscriptionResult type from the appropriate store module and update the
type annotation to use it instead of 'any' to ensure proper type safety.

Comment on lines +431 to +434
interface SegmentViewerProps {
result: any; // TranscriptionResult
onClose: () => void;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use proper TypeScript types instead of 'any'

Replace 'any' types with proper interfaces for better type safety.

+import type { TranscriptionResult } from "@/stores/speech-to-text-store";
+
 interface SegmentViewerProps {
-  result: any; // TranscriptionResult
+  result: TranscriptionResult | undefined;
   onClose: () => void;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
interface SegmentViewerProps {
result: any; // TranscriptionResult
onClose: () => void;
}
import type { TranscriptionResult } from "@/stores/speech-to-text-store";
interface SegmentViewerProps {
result: TranscriptionResult | undefined;
onClose: () => void;
}
🤖 Prompt for AI Agents
In apps/web/src/components/editor/media-panel/views/captions.tsx around lines
431 to 434, the SegmentViewerProps interface uses 'any' for the 'result'
property, which reduces type safety. Replace 'any' with the correct TypeScript
interface that represents the transcription result, such as
'TranscriptionResult' or the appropriate type defined elsewhere in the codebase,
to ensure proper type checking and improve code reliability.

Comment on lines +134 to +143
if (actualDurationSamples <= 0) {
// Return silent buffer if no valid audio range
const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)(); const silentBuffer = audioContext.createBuffer(
audioBuffer.numberOfChannels,
1,
sampleRate
);
audioContext.close();
return silentBuffer;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Fix formatting and optimize AudioContext usage.

There's a formatting issue at line 136 and the function creates multiple AudioContext instances.

   if (actualDurationSamples <= 0) {
     // Return silent buffer if no valid audio range
-  const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)();    const silentBuffer = audioContext.createBuffer(
+    const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)();
+    const silentBuffer = audioContext.createBuffer(
       audioBuffer.numberOfChannels,
       1,
       sampleRate
     );
     audioContext.close();
     return silentBuffer;
   }

Also consider creating a single AudioContext at the beginning of the function to avoid multiple instantiations.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (actualDurationSamples <= 0) {
// Return silent buffer if no valid audio range
const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)(); const silentBuffer = audioContext.createBuffer(
audioBuffer.numberOfChannels,
1,
sampleRate
);
audioContext.close();
return silentBuffer;
}
if (actualDurationSamples <= 0) {
// Return silent buffer if no valid audio range
const audioContext = new (window.AudioContext || (window as any).webkitAudioContext)();
const silentBuffer = audioContext.createBuffer(
audioBuffer.numberOfChannels,
1,
sampleRate
);
audioContext.close();
return silentBuffer;
}
🤖 Prompt for AI Agents
In apps/web/src/lib/audio-extraction.ts around lines 134 to 143, fix the
formatting issue at line 136 by properly indenting the code and splitting the
combined statements into separate lines. Additionally, refactor the function to
create a single AudioContext instance at the beginning and reuse it throughout
the function to avoid multiple instantiations, ensuring to close the
AudioContext appropriately after use.

Comment on lines +536 to +638
insertResultToTimeline: (resultId: string, mode: 'sentences' | 'words' = 'sentences') => {
const state = get();
const result = state.results.find(r => r.id === resultId);

if (!result) {
return;
}

const timelineStore = useTimelineStore.getState();
const elementInfo = state.getSelectedElementInfo();

if (!elementInfo) {
console.warn('No element selected - cannot determine timeline position');
return;
}

// Calculate timing offset based on the element's position in the timeline
const element = elementInfo.element;

// IMPORTANT: The timestamps from transcription are relative to the EXTRACTED audio
// which starts from trimStart, not from the beginning of the original file
// So we need to add: element.startTime (timeline position)
// Note: We DON'T add trimStart because the transcription timestamps are already relative to the extracted portion
const timelineOffset = element.startTime;

// Prepare all text elements in advance
const textElements: TextElement[] = [];

if (mode === 'words') {
// Insert individual words as text elements
let wordCount = 0;
result.transcript.chunks.forEach((chunk) => {
chunk.words.forEach((word) => {
const startTime = timelineOffset + (word.startTime / 1000); // Convert from ms to seconds
const duration = (word.endTime - word.startTime) / 1000; // Convert from ms to seconds

const textElement: TextElement = {
id: crypto.randomUUID(),
type: 'text',
name: `Word ${wordCount + 1}: ${word.text}`,
content: word.text,
duration: Math.max(duration, 0.5), // Minimum 0.5 second duration for words
startTime: startTime,
trimStart: 0,
trimEnd: 0,
fontSize: 36,
fontFamily: 'Arial',
color: '#ffffff',
backgroundColor: 'rgba(0, 0, 0, 0.7)',
textAlign: 'center' as const,
fontWeight: 'bold' as const,
fontStyle: 'normal' as const,
textDecoration: 'none' as const,
x: 0,
y: 200,
rotation: 0,
opacity: 1
};

textElements.push(textElement);
wordCount++;
});
});

} else {
// Insert sentence chunks as text elements
result.chunks.forEach((chunk, index) => {
const duration = chunk.timestamp[1] - chunk.timestamp[0];
// Add the timeline offset to get the absolute position
const startTime = timelineOffset + chunk.timestamp[0];

// Create a more descriptive name for the text element
const words = chunk.text.trim().split(/\s+/);
const shortText = words.length > 3
? words.slice(0, 3).join(' ') + '...'
: chunk.text;

const textElement: TextElement = {
id: crypto.randomUUID(),
type: 'text',
name: `Subtitle ${index + 1}: ${shortText}`,
content: chunk.text.trim(),
duration: Math.max(duration, 1), // Ensure minimum 1 second duration
startTime: startTime,
trimStart: 0,
trimEnd: 0,
fontSize: 36,
fontFamily: 'Arial',
color: '#ffffff',
backgroundColor: 'rgba(0, 0, 0, 0.7)', // Semi-transparent background for readability
textAlign: 'center' as const,
fontWeight: 'bold' as const,
fontStyle: 'normal' as const,
textDecoration: 'none' as const,
x: 0, // Center horizontally
y: 200, // Position at bottom (positive y moves down from center)
rotation: 0,
opacity: 1
};

textElements.push(textElement);
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Extract text element styling to constants for better maintainability

The text element creation has many hardcoded style values that should be extracted to constants or configuration.

+// Add at the top of the file or in a separate config
+const DEFAULT_TEXT_ELEMENT_STYLE = {
+  fontSize: 36,
+  fontFamily: 'Arial',
+  color: '#ffffff',
+  backgroundColor: 'rgba(0, 0, 0, 0.7)',
+  textAlign: 'center' as const,
+  fontWeight: 'bold' as const,
+  fontStyle: 'normal' as const,
+  textDecoration: 'none' as const,
+  x: 0,
+  y: 200,
+  rotation: 0,
+  opacity: 1
+};
+
+const MINIMUM_WORD_DURATION = 0.5;
+const MINIMUM_SENTENCE_DURATION = 1.0;

   if (mode === 'words') {
     // Insert individual words as text elements
     let wordCount = 0;
     result.transcript.chunks.forEach((chunk) => {
       chunk.words.forEach((word) => {
         const startTime = timelineOffset + (word.startTime / 1000);
         const duration = (word.endTime - word.startTime) / 1000;
         
         const textElement: TextElement = {
           id: crypto.randomUUID(),
           type: 'text',
           name: `Word ${wordCount + 1}: ${word.text}`,
           content: word.text,
-          duration: Math.max(duration, 0.5),
+          duration: Math.max(duration, MINIMUM_WORD_DURATION),
           startTime: startTime,
           trimStart: 0,
           trimEnd: 0,
-          fontSize: 36,
-          fontFamily: 'Arial',
-          color: '#ffffff',
-          backgroundColor: 'rgba(0, 0, 0, 0.7)',
-          textAlign: 'center' as const,
-          fontWeight: 'bold' as const,
-          fontStyle: 'normal' as const,
-          textDecoration: 'none' as const,
-          x: 0,
-          y: 200,
-          rotation: 0,
-          opacity: 1
+          ...DEFAULT_TEXT_ELEMENT_STYLE
         };
🤖 Prompt for AI Agents
In apps/web/src/stores/speech-to-text-store.ts around lines 536 to 638, the text
element creation uses many hardcoded style values such as fontSize, fontFamily,
color, backgroundColor, textAlign, fontWeight, fontStyle, textDecoration, x, y,
rotation, and opacity. To improve maintainability, extract these repeated style
properties into constants or a configuration object at the top of the file or in
a separate styles module. Then, reference these constants when creating text
elements instead of hardcoding the values directly.

Comment on lines +5 to +30
export interface TranscriptWord {
text: string;
startTime: number; // in milliseconds
endTime: number; // in milliseconds
}

export interface TranscriptChunk {
words: TranscriptWord[];
startTime: number; // in milliseconds
endTime: number; // in milliseconds
text: string;
}

export interface Transcript {
id: string;
chunks: TranscriptChunk[];
language: string;
totalDuration: number; // in milliseconds
}

export interface SRTSegment {
index: number;
startTime: string; // SRT format: "00:00:01,000"
endTime: string; // SRT format: "00:00:02,000"
text: string;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use export type for type exports.

According to the project's TypeScript conventions, types should be exported using export type.

Apply this diff to follow the project's type export convention:

-export interface TranscriptWord {
+export type TranscriptWord = {
   text: string;
   startTime: number; // in milliseconds
   endTime: number; // in milliseconds
-}
+};

-export interface TranscriptChunk {
+export type TranscriptChunk = {
   words: TranscriptWord[];
   startTime: number; // in milliseconds
   endTime: number; // in milliseconds
   text: string;
-}
+};

-export interface Transcript {
+export type Transcript = {
   id: string;
   chunks: TranscriptChunk[];
   language: string;
   totalDuration: number; // in milliseconds
-}
+};

-export interface SRTSegment {
+export type SRTSegment = {
   index: number;
   startTime: string; // SRT format: "00:00:01,000"
   endTime: string; // SRT format: "00:00:02,000"
   text: string;
-}
+};
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export interface TranscriptWord {
text: string;
startTime: number; // in milliseconds
endTime: number; // in milliseconds
}
export interface TranscriptChunk {
words: TranscriptWord[];
startTime: number; // in milliseconds
endTime: number; // in milliseconds
text: string;
}
export interface Transcript {
id: string;
chunks: TranscriptChunk[];
language: string;
totalDuration: number; // in milliseconds
}
export interface SRTSegment {
index: number;
startTime: string; // SRT format: "00:00:01,000"
endTime: string; // SRT format: "00:00:02,000"
text: string;
}
export type TranscriptWord = {
text: string;
startTime: number; // in milliseconds
endTime: number; // in milliseconds
};
export type TranscriptChunk = {
words: TranscriptWord[];
startTime: number; // in milliseconds
endTime: number; // in milliseconds
text: string;
};
export type Transcript = {
id: string;
chunks: TranscriptChunk[];
language: string;
totalDuration: number; // in milliseconds
};
export type SRTSegment = {
index: number;
startTime: string; // SRT format: "00:00:01,000"
endTime: string; // SRT format: "00:00:02,000"
text: string;
};
🤖 Prompt for AI Agents
In apps/web/src/types/transcript.ts from lines 5 to 30, the interfaces are
exported using `export interface`. To follow the project's TypeScript
conventions, change all `export interface` declarations to `export type`
declarations while keeping the structure intact. This means converting each
interface to a type alias with the same shape and exporting it using `export
type`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant