Skip to content

Commit b650b73

Browse files
committed
2 parents fd0baf2 + 04e4664 commit b650b73

File tree

1 file changed

+61
-0
lines changed

1 file changed

+61
-0
lines changed

README.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# ASL Translation Data Preprocessing
2+
3+
This repository provides a comprehensive solution for preprocessing American Sign Language (ASL) datasets, specifically designed to handle both How2Sign and YouTube ASL datasets. Our preprocessing pipeline streamlines the workflow from video acquisition to landmark extraction, making the data ready for ASL translation tasks.
4+
5+
## Project Configuration
6+
7+
All project settings are centrally managed through `conf.py`, offering flexible configuration options for video processing, dataset management, and feature extraction. The configuration file controls several key aspects:
8+
9+
The system allows customization of video processing parameters, including frame skip rates and maximum frame limits, to optimize processing efficiency while maintaining data quality. It manages dataset paths and directories, ensuring organized data storage and retrieval. The configuration also specifies MediaPipe landmark indices for detailed capture of pose, face, and hand movements, essential for ASL translation. Additionally, it includes language preference settings for YouTube transcript collection, supporting various English language variants.
10+
11+
## YouTube ASL Dataset Processing
12+
13+
The processing of YouTube ASL dataset follows a systematic three-step approach, ensuring comprehensive data preparation:
14+
15+
### Step 1: Data Acquisition
16+
17+
This initial phase combines two parallel processes:
18+
19+
The video downloader (`s1_video_download.py`) efficiently retrieves YouTube videos using yt-dlp, implementing smart rate limiting and quality control measures. It includes features for parallel fragment downloads and automatically skips previously downloaded content to prevent redundant processing.
20+
21+
Simultaneously, the transcript collector (`s1_transcript_downloader.py`) obtains video transcripts through the YouTube Transcript API. This component handles multiple English language variants and saves the transcripts in a structured JSON format, while maintaining appropriate rate limits to ensure reliable data collection.
22+
23+
### Step 2: Transcript Enhancement
24+
25+
The transcript processor (`s2_transcript_preprocess.py`) refines the raw transcript data into a format suitable for ASL translation. It performs sophisticated text normalization, including Unicode handling and ASCII conversion, while preserving semantic meaning. The system segments videos into overlapping chunks with precise timing information, generating well-structured CSV files containing the processed segments.
26+
27+
### Step 3: Feature Extraction
28+
29+
The landmark extraction system (`s3_mediapipe_labelling.py`) utilizes the MediaPipe Holistic model to capture detailed movement data. It processes video segments to extract comprehensive pose, face, and hand landmarks, leveraging parallel processing capabilities for efficient computation. The extracted features are stored as numpy arrays for subsequent analysis and translation tasks.
30+
31+
## How2Sign Dataset Processing
32+
33+
For the How2Sign dataset, our system offers two specialized approaches for MediaPipe landmark extraction:
34+
35+
### Clip-Based Processing
36+
37+
The clip processor (`H2S_clip_mediapipe.py`) handles complete video clips in a single pass. It employs adaptive frame skipping to optimize processing speed while maintaining data quality. The system leverages parallel processing capabilities to handle multiple clips simultaneously, ensuring efficient resource utilization.
38+
39+
### Raw Video Processing
40+
41+
The raw video processor (`H2S_raw_mediapipe.py`) takes a more granular approach, working with precise realigned timestamps from a CSV file. This method extracts landmarks for specific video segments, maintaining temporal accuracy while utilizing parallel processing for optimal performance.
42+
43+
## Data Organization
44+
45+
The system organizes processed data into clearly defined formats:
46+
- Video content is stored as MP4 files for optimal quality and compatibility
47+
- Transcripts are maintained in JSON format for easy parsing and manipulation
48+
- Segment information is organized in CSV files for straightforward analysis
49+
- Extracted landmarks are preserved as NumPy arrays (.npy files) for efficient processing
50+
51+
## Technical Requirements
52+
53+
The system relies on several key Python libraries:
54+
- OpenCV (cv2) for video processing
55+
- MediaPipe for pose and gesture recognition
56+
- NumPy for efficient numerical operations
57+
- Pandas for data manipulation
58+
- yt-dlp for video downloading
59+
- youtube-transcript-api for transcript retrieval
60+
61+
This preprocessing pipeline creates a robust foundation for ASL translation tasks, ensuring high-quality data preparation while maintaining processing efficiency.

0 commit comments

Comments
 (0)