11# ASL Dataset Preprocessing Pipeline
22
3- [ ![ License] ( https://img.shields.io/badge/License-Apache%202.0-blue.svg )] ( LICENSE )
4- [ ![ Python] ( https://img.shields.io/badge/Python-3.8%2B-blue.svg )] ( https://www.python.org/downloads/ )
5- [ ![ Code style: black] ( https://img.shields.io/badge/code%20style-modular-black.svg )] ( https://github.com/psf/black )
6-
73A professional, modular pipeline for preprocessing ** American Sign Language (ASL)** datasets, supporting both ** YouTube-ASL** and ** How2Sign** datasets. This project implements the methodology from [ "YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel Corpus" (Uthus et al., 2023)] ( https://arxiv.org/abs/2306.15162 ) .
84
95The pipeline handles the complete workflow from video acquisition to landmark extraction, preparing data for ASL translation tasks using ** MediaPipe Holistic** and ** MMPose RTMPose3D** .
106
117---
128
13- ## 📋 Table of Contents
9+ ## Table of Contents
1410
1511- [ Features] ( #-features )
1612- [ Project Structure] ( #-project-structure )
@@ -29,20 +25,20 @@ The pipeline handles the complete workflow from video acquisition to landmark ex
2925
3026---
3127
32- ## ✨ Features
28+ ## Features
3329
34- - 🎯 ** Modular Architecture** - Clean separation of concerns with reusable components
35- - 🔄 ** Two Landmark Extractors** - MediaPipe Holistic and MMPose RTMPose3D support
36- - 📊 ** Dual Dataset Support** - Works with YouTube-ASL and How2Sign datasets
37- - ⚡ ** Parallel Processing** - Multi-worker support for efficient video processing
38- - 🎬 ** Smart Frame Sampling** - Configurable FPS reduction and frame skipping
39- - 📝 ** Comprehensive Logging** - Detailed progress tracking and error reporting
40- - 🔧 ** Flexible Configuration** - Script-specific config files for easy customization
41- - 📦 ** Production Ready** - Type hints, docstrings, and error handling throughout
30+ - ** Modular Architecture** - Clean separation of concerns with reusable components
31+ - ** Two Landmark Extractors** - MediaPipe Holistic and MMPose RTMPose3D support
32+ - ** Dual Dataset Support** - Works with YouTube-ASL and How2Sign datasets
33+ - ** Parallel Processing** - Multi-worker support for efficient video processing
34+ - ** Smart Frame Sampling** - Configurable FPS reduction and frame skipping
35+ - ** Comprehensive Logging** - Detailed progress tracking and error reporting
36+ - ** Flexible Configuration** - Script-specific config files for easy customization
37+ - ** Production Ready** - Type hints, docstrings, and error handling throughout
4238
4339---
4440
45- ## 📁 Project Structure
41+ ## Project Structure
4642
4743```
4844ASL-Dataset-Preprocess/
@@ -94,7 +90,7 @@ ASL-Dataset-Preprocess/
9490
9591---
9692
97- ## 🔧 Prerequisites
93+ ## Prerequisites
9894
9995### System Requirements
10096
@@ -103,18 +99,9 @@ ASL-Dataset-Preprocess/
10399- ** GPU** : CUDA-compatible GPU recommended for MMPose (optional for MediaPipe)
104100- ** Storage** : ~ 100GB+ for datasets and models
105101
106- ### Core Dependencies
107-
108- - ** MediaPipe** - Holistic body landmark detection
109- - ** MMPose** - Advanced 3D pose estimation (optional)
110- - ** OpenCV** - Video processing
111- - ** NumPy** - Numerical operations
112- - ** Pandas** - Data manipulation
113- - ** yt-dlp** - YouTube video downloading
114-
115102---
116103
117- ## 📦 Installation
104+ ## Installation
118105
119106### 1. Clone the Repository
120107
@@ -131,24 +118,33 @@ source venv/bin/activate # On Windows: venv\Scripts\activate
131118pip install -r requirements.txt
132119```
133120
134- ### 3. Download MMPose Model Checkpoints (If Using MMPose)
121+ ### 3. Download MMPose and the checkpoints (If Using MMPose)
135122
136123``` bash
124+ pip install -U openmim
125+ mim install mmcv==2.0.1 mmengine==0.10.7 mmdet==3.1.0
126+ cd ..
127+ git clone https://github.com/open-mmlab/mmpose.git
128+ cd mmpose
129+ pip install -r requirements.txt
130+ pip install -v -e .
131+
132+ # add mmpose to the pythonpath
133+ echo ' export PYTHONPATH="/your/path/to/mmpose:$PYTHONPATH"' >> ~ /.bashrc
134+
137135# Create checkpoint directory
138- mkdir -p models/checkpoints
136+ cd path/of/this/project/ models/checkpoints
139137
140138# Download RTMPose3D model (whole-body 3D pose)
141- wget https://download.openmmlab.com/mmpose/v1/wholebody_3d_keypoint/rtmw3d/rtmw3d-l_8xb64_cocktail14-384x288-794dbc78_20240626.pth \
142- -O models/checkpoints/rtmw3d-l_8xb64_cocktail14-384x288-794dbc78_20240626.pth
139+ wget https://download.openmmlab.com/mmpose/v1/wholebody_3d_keypoint/rtmw3d/rtmw3d-l_8xb64_cocktail14-384x288-794dbc78_20240626.pth
143140
144141# Download RTMDet model (person detection)
145- wget https://download.openmmlab.com/mmpose/v1/projects/rtmpose/rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth \
146- -O models/checkpoints/rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth
142+ wget https://download.openmmlab.com/mmpose/v1/projects/rtmpose/rtmdet_nano_8xb32-100e_coco-obj365-person-05d8511e.pth
147143```
148144
149145---
150146
151- ## 🚀 Quick Start
147+ ## Quick Start
152148
153149### YouTube-ASL Pipeline
154150
@@ -185,99 +181,7 @@ python scripts/3b_extract_mmpose.py # MMPose
185181
186182---
187183
188- ## ⚙️ Configuration
189-
190- Each pipeline script has its own configuration file in ` configs/ ` :
191-
192- ### ` configs/download.py ` - YouTube Download Settings
193-
194- ``` python
195- # Video ID source
196- VIDEO_ID_FILE = " assets/youtube-asl_youtube_asl_video_ids.txt"
197-
198- # Download directories
199- VIDEO_DIR = " dataset/origin/"
200- TRANSCRIPT_DIR = " dataset/transcript/"
201-
202- # YouTube download settings
203- YT_CONFIG = {
204- " format" : " worstvideo[height>=720]/bestvideo[height<=480]" ,
205- " limit_rate" : " 5M" , # Limit to 5 MB/s
206- # ... more settings
207- }
208-
209- # Supported languages for transcripts
210- LANGUAGE = [" en" , " ase" , " en-US" , ... ]
211- ```
212-
213- ### ` configs/build_manifest.py ` - Transcript Processing
214-
215- ``` python
216- # Input/Output paths
217- VIDEO_ID_FILE = " assets/youtube-asl_youtube_asl_video_ids.txt"
218- TRANSCRIPT_DIR = " dataset/transcript/"
219- OUTPUT_CSV = " assets/youtube_asl.csv"
220-
221- # Filtering constraints
222- MAX_TEXT_LENGTH = 300 # characters
223- MIN_DURATION = 0.2 # seconds
224- MAX_DURATION = 60.0 # seconds
225- ```
226-
227- ### ` configs/extract_mediapipe.py ` - MediaPipe Extraction
228-
229- ``` python
230- # Data paths
231- CSV_FILE = " dataset/how2sign/how2sign_realigned_val.csv"
232- VIDEO_DIR = " dataset/origin/"
233- NPY_DIR = " dataset/npy/"
234-
235- # Frame sampling
236- REDUCE_FPS_TO = 24.0 # Target FPS (None to disable)
237- FRAME_SKIP = 2 # Skip every Nth frame (when not using REDUCE_FPS_TO)
238- ACCEPT_VIDEO_FPS_WITHIN = (24.0 , 60.0 ) # Valid FPS range
239-
240- # Processing
241- MAX_WORKERS = 4 # Parallel workers
242-
243- # Landmark selection (from YouTube-ASL paper)
244- POSE_IDX = [11 , 12 , 13 , 14 , 23 , 24 ] # Shoulders, elbows, hips
245- FACE_IDX = [0 , 4 , 13 , 14 , 17 , ... ] # 37 facial landmarks
246- HAND_IDX = list (range (21 )) # All hand landmarks
247- ```
248-
249- ### ` configs/extract_mmpose.py ` - MMPose 3D Extraction
250-
251- ``` python
252- # Data paths
253- CSV_FILE = " dataset/how2sign/how2sign_realigned_val.csv"
254- VIDEO_DIR = " dataset/origin/"
255- NPY_DIR = " dataset/npy/"
256-
257- # Frame sampling
258- REDUCE_FPS_TO = 24.0
259- FRAME_SKIP = 2
260- ACCEPT_VIDEO_FPS_WITHIN = (24.0 , 60.0 )
261- MAX_WORKERS = 4
262-
263- # Keypoint selection (85 keypoints from COCO-WholeBody)
264- COCO_WHOLEBODY_IDX = [5 , 6 , 7 , 8 , 11 , 12 , ... ]
265-
266- # Model paths
267- POSE_MODEL_CHECKPOINT = " models/checkpoints/rtmw3d-l_..."
268- DET_MODEL_CHECKPOINT = " models/checkpoints/rtmdet_m_..."
269-
270- # Output format
271- ADD_VISIBLE = True # Include visibility scores
272-
273- # Inference parameters
274- BBOX_THR = 0.5 # Person detection threshold
275- KPT_THR = 0.3 # Keypoint confidence threshold
276- ```
277-
278- ---
279-
280- ## 🔄 Pipeline Stages
184+ ## Pipeline Stages
281185
282186### Stage 1: Data Acquisition (` 1_download_data.py ` )
283187
@@ -351,7 +255,7 @@ Extracts 3D pose landmarks using MMPose RTMPose3D.
351255
352256---
353257
354- ## 📚 Dataset Information
258+ ## Dataset Information
355259
356260### YouTube-ASL Dataset
357261
@@ -394,155 +298,10 @@ Extracts 3D pose landmarks using MMPose RTMPose3D.
394298
395299---
396300
397- ## 🔬 Advanced Usage
398-
399- ### Custom Landmark Selection
400-
401- Edit landmark indices in config files to extract different keypoints:
402-
403- ``` python
404- # configs/extract_mediapipe.py
405-
406- # Example: Extract only hands (no pose, no face)
407- POSE_IDX = [] # Empty - skip pose
408- FACE_IDX = [] # Empty - skip face
409- HAND_IDX = list (range (21 )) # All hand landmarks
410-
411- # Output will be: 21 left + 21 right = 42 landmarks × 3 coords = 126 features
412- ```
413-
414- ### Adjust Frame Sampling
415-
416- Control processing speed vs. temporal resolution:
417-
418- ``` python
419- # configs/extract_mediapipe.py
420-
421- # Option 1: Fixed target FPS (recommended)
422- REDUCE_FPS_TO = 15.0 # Downsample all videos to 15 FPS
423- FRAME_SKIP = 1 # Not used when REDUCE_FPS_TO is set
424-
425- # Option 2: Skip every Nth frame
426- REDUCE_FPS_TO = None # Disable FPS reduction
427- FRAME_SKIP = 3 # Sample every 3rd frame (1/3 rate)
428- ```
429-
430- ### Parallel Processing Tuning
431-
432- Adjust worker count based on your hardware:
433-
434- ``` python
435- # configs/extract_mediapipe.py or extract_mmpose.py
436-
437- # CPU-bound (MediaPipe)
438- MAX_WORKERS = 4 # Typically CPU cores - 1
439-
440- # GPU-bound (MMPose)
441- MAX_WORKERS = 2 # Fewer workers due to GPU memory constraints
442- ```
443-
444- ### Filter Videos by FPS
445-
446- Skip videos with unusual frame rates:
447-
448- ``` python
449- # configs/extract_mediapipe.py
450-
451- # Only process videos between 24-60 FPS
452- ACCEPT_VIDEO_FPS_WITHIN = (24.0 , 60.0 )
453-
454- # Accept all frame rates
455- ACCEPT_VIDEO_FPS_WITHIN = (1.0 , 120.0 )
456- ```
457-
458- ---
459-
460- ## 🛠️ Troubleshooting
461-
462- ### Common Issues
463-
464- ** 1. Import Error: ` cannot import name 'TooManyRequests' ` **
465-
466- Update youtube-transcript-api:
467- ``` bash
468- pip install --upgrade youtube-transcript-api
469- ```
470-
471- ** 2. MMPose Model Not Found**
472-
473- Download model checkpoints (see Installation section) or update paths in ` configs/extract_mmpose.py ` .
474-
475- ** 3. CUDA Out of Memory (MMPose)**
476-
477- Reduce ` MAX_WORKERS ` in ` configs/extract_mmpose.py ` :
478- ``` python
479- MAX_WORKERS = 1 # Process one video at a time
480- ```
481-
482- ** 4. Video Download Fails**
483-
484- Check if video is still available on YouTube. Update yt-dlp:
485- ``` bash
486- pip install --upgrade yt-dlp
487- ```
488-
489- ** 5. Slow Processing**
490-
491- - Enable FPS reduction: Set ` REDUCE_FPS_TO = 15.0 `
492- - Increase ` FRAME_SKIP ` to sample fewer frames
493- - Reduce ` MAX_WORKERS ` if system is overloaded
494-
495- ### Debug Mode
496-
497- Enable detailed logging:
498-
499- ``` bash
500- # Add to scripts before running
501- import logging
502- logging.basicConfig(level=logging.DEBUG)
503- ```
504-
505- ### Validation
506-
507- Check output landmark arrays:
508-
509- ``` python
510- import numpy as np
511-
512- # Load landmark array
513- landmarks = np.load(" dataset/npy/video_id-001.npy" )
514-
515- print (f " Shape: { landmarks.shape} " ) # (T, 255) or (T, 340)
516- print (f " Min: { landmarks.min():.3f } " ) # Should be ~-1 to 0
517- print (f " Max: { landmarks.max():.3f } " ) # Should be ~1 to 2
518- print (f " Mean: { landmarks.mean():.3f } " ) # Should be ~0 to 1
519- print (f " Has NaN: { np.isnan(landmarks).any()} " ) # Should be False
520- ```
521-
522- ---
523-
524- ## 📄 License
301+ ## License
525302
526303This project is licensed under the Apache License 2.0 - see the [ LICENSE] ( LICENSE ) file for details.
527304
528305---
529306
530- ## 🙏 Acknowledgments
531-
532- - ** YouTube-ASL Team** - For the dataset and methodology
533- - ** How2Sign Team** - For the How2Sign dataset
534- - ** MediaPipe Team** - For holistic body landmark detection
535- - ** MMPose Team** - For advanced 3D pose estimation
536- - ** OpenMMLab** - For the excellent computer vision framework
537-
538- ---
539-
540- ## 📞 Contact & Support
541-
542- - ** Issues** : [ GitHub Issues] ( https://github.com/yourusername/ASL-Dataset-Preprocess/issues )
543- - ** Documentation** : See ` REORGANIZATION_SUMMARY.md ` for architecture details
544- - ** Contributing** : Pull requests welcome!
545-
546- ---
547307
548- ** Happy ASL Preprocessing! 🤟**
0 commit comments