Skip to content

Commit c5a5e77

Browse files
committed
Add comprehensive documentation and visual examples
- Remove emojis from README.md for cleaner appearance - Add pipeline stage documentation for all processing steps - Include MediaPipe and MMPose extraction sample visualizations - Document data acquisition, manifest building, and normalization stages
1 parent 9966847 commit c5a5e77

File tree

8 files changed

+2331
-273
lines changed

8 files changed

+2331
-273
lines changed

README.md

Lines changed: 32 additions & 273 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,12 @@
11
# ASL Dataset Preprocessing Pipeline
22

3-
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
4-
[![Python](https://img.shields.io/badge/Python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
5-
[![Code style: black](https://img.shields.io/badge/code%20style-modular-black.svg)](https://github.com/psf/black)
6-
73
A professional, modular pipeline for preprocessing **American Sign Language (ASL)** datasets, supporting both **YouTube-ASL** and **How2Sign** datasets. This project implements the methodology from ["YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel Corpus" (Uthus et al., 2023)](https://arxiv.org/abs/2306.15162).
84

95
The pipeline handles the complete workflow from video acquisition to landmark extraction, preparing data for ASL translation tasks using **MediaPipe Holistic** and **MMPose RTMPose3D**.
106

117
---
128

13-
## 📋 Table of Contents
9+
## Table of Contents
1410

1511
- [Features](#-features)
1612
- [Project Structure](#-project-structure)
@@ -29,20 +25,20 @@ The pipeline handles the complete workflow from video acquisition to landmark ex
2925

3026
---
3127

32-
## Features
28+
## Features
3329

34-
- 🎯 **Modular Architecture** - Clean separation of concerns with reusable components
35-
- 🔄 **Two Landmark Extractors** - MediaPipe Holistic and MMPose RTMPose3D support
36-
- 📊 **Dual Dataset Support** - Works with YouTube-ASL and How2Sign datasets
37-
- **Parallel Processing** - Multi-worker support for efficient video processing
38-
- 🎬 **Smart Frame Sampling** - Configurable FPS reduction and frame skipping
39-
- 📝 **Comprehensive Logging** - Detailed progress tracking and error reporting
40-
- 🔧 **Flexible Configuration** - Script-specific config files for easy customization
41-
- 📦 **Production Ready** - Type hints, docstrings, and error handling throughout
30+
- **Modular Architecture** - Clean separation of concerns with reusable components
31+
- **Two Landmark Extractors** - MediaPipe Holistic and MMPose RTMPose3D support
32+
- **Dual Dataset Support** - Works with YouTube-ASL and How2Sign datasets
33+
- **Parallel Processing** - Multi-worker support for efficient video processing
34+
- **Smart Frame Sampling** - Configurable FPS reduction and frame skipping
35+
- **Comprehensive Logging** - Detailed progress tracking and error reporting
36+
- **Flexible Configuration** - Script-specific config files for easy customization
37+
- **Production Ready** - Type hints, docstrings, and error handling throughout
4238

4339
---
4440

45-
## 📁 Project Structure
41+
## Project Structure
4642

4743
```
4844
ASL-Dataset-Preprocess/
@@ -94,7 +90,7 @@ ASL-Dataset-Preprocess/
9490

9591
---
9692

97-
## 🔧 Prerequisites
93+
## Prerequisites
9894

9995
### System Requirements
10096

@@ -103,18 +99,9 @@ ASL-Dataset-Preprocess/
10399
- **GPU**: CUDA-compatible GPU recommended for MMPose (optional for MediaPipe)
104100
- **Storage**: ~100GB+ for datasets and models
105101

106-
### Core Dependencies
107-
108-
- **MediaPipe** - Holistic body landmark detection
109-
- **MMPose** - Advanced 3D pose estimation (optional)
110-
- **OpenCV** - Video processing
111-
- **NumPy** - Numerical operations
112-
- **Pandas** - Data manipulation
113-
- **yt-dlp** - YouTube video downloading
114-
115102
---
116103

117-
## 📦 Installation
104+
## Installation
118105

119106
### 1. Clone the Repository
120107

@@ -131,24 +118,33 @@ source venv/bin/activate # On Windows: venv\Scripts\activate
131118
pip install -r requirements.txt
132119
```
133120

134-
### 3. Download MMPose Model Checkpoints (If Using MMPose)
121+
### 3. Download MMPose and the checkpoints (If Using MMPose)
135122

136123
```bash
124+
pip install -U openmim
125+
mim install mmcv==2.0.1 mmengine==0.10.7 mmdet==3.1.0
126+
cd ..
127+
git clone https://github.com/open-mmlab/mmpose.git
128+
cd mmpose
129+
pip install -r requirements.txt
130+
pip install -v -e .
131+
132+
# add mmpose to the pythonpath
133+
echo 'export PYTHONPATH="/your/path/to/mmpose:$PYTHONPATH"' >> ~/.bashrc
134+
137135
# Create checkpoint directory
138-
mkdir -p models/checkpoints
136+
cd path/of/this/project/models/checkpoints
139137

140138
# Download RTMPose3D model (whole-body 3D pose)
141-
wget https://download.openmmlab.com/mmpose/v1/wholebody_3d_keypoint/rtmw3d/rtmw3d-l_8xb64_cocktail14-384x288-794dbc78_20240626.pth \
142-
-O models/checkpoints/rtmw3d-l_8xb64_cocktail14-384x288-794dbc78_20240626.pth
139+
wget https://download.openmmlab.com/mmpose/v1/wholebody_3d_keypoint/rtmw3d/rtmw3d-l_8xb64_cocktail14-384x288-794dbc78_20240626.pth
143140

144141
# Download RTMDet model (person detection)
145-
wget https://download.openmmlab.com/mmpose/v1/projects/rtmpose/rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth \
146-
-O models/checkpoints/rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth
142+
wget https://download.openmmlab.com/mmpose/v1/projects/rtmpose/rtmdet_nano_8xb32-100e_coco-obj365-person-05d8511e.pth
147143
```
148144

149145
---
150146

151-
## 🚀 Quick Start
147+
## Quick Start
152148

153149
### YouTube-ASL Pipeline
154150

@@ -185,99 +181,7 @@ python scripts/3b_extract_mmpose.py # MMPose
185181

186182
---
187183

188-
## ⚙️ Configuration
189-
190-
Each pipeline script has its own configuration file in `configs/`:
191-
192-
### `configs/download.py` - YouTube Download Settings
193-
194-
```python
195-
# Video ID source
196-
VIDEO_ID_FILE = "assets/youtube-asl_youtube_asl_video_ids.txt"
197-
198-
# Download directories
199-
VIDEO_DIR = "dataset/origin/"
200-
TRANSCRIPT_DIR = "dataset/transcript/"
201-
202-
# YouTube download settings
203-
YT_CONFIG = {
204-
"format": "worstvideo[height>=720]/bestvideo[height<=480]",
205-
"limit_rate": "5M", # Limit to 5 MB/s
206-
# ... more settings
207-
}
208-
209-
# Supported languages for transcripts
210-
LANGUAGE = ["en", "ase", "en-US", ...]
211-
```
212-
213-
### `configs/build_manifest.py` - Transcript Processing
214-
215-
```python
216-
# Input/Output paths
217-
VIDEO_ID_FILE = "assets/youtube-asl_youtube_asl_video_ids.txt"
218-
TRANSCRIPT_DIR = "dataset/transcript/"
219-
OUTPUT_CSV = "assets/youtube_asl.csv"
220-
221-
# Filtering constraints
222-
MAX_TEXT_LENGTH = 300 # characters
223-
MIN_DURATION = 0.2 # seconds
224-
MAX_DURATION = 60.0 # seconds
225-
```
226-
227-
### `configs/extract_mediapipe.py` - MediaPipe Extraction
228-
229-
```python
230-
# Data paths
231-
CSV_FILE = "dataset/how2sign/how2sign_realigned_val.csv"
232-
VIDEO_DIR = "dataset/origin/"
233-
NPY_DIR = "dataset/npy/"
234-
235-
# Frame sampling
236-
REDUCE_FPS_TO = 24.0 # Target FPS (None to disable)
237-
FRAME_SKIP = 2 # Skip every Nth frame (when not using REDUCE_FPS_TO)
238-
ACCEPT_VIDEO_FPS_WITHIN = (24.0, 60.0) # Valid FPS range
239-
240-
# Processing
241-
MAX_WORKERS = 4 # Parallel workers
242-
243-
# Landmark selection (from YouTube-ASL paper)
244-
POSE_IDX = [11, 12, 13, 14, 23, 24] # Shoulders, elbows, hips
245-
FACE_IDX = [0, 4, 13, 14, 17, ...] # 37 facial landmarks
246-
HAND_IDX = list(range(21)) # All hand landmarks
247-
```
248-
249-
### `configs/extract_mmpose.py` - MMPose 3D Extraction
250-
251-
```python
252-
# Data paths
253-
CSV_FILE = "dataset/how2sign/how2sign_realigned_val.csv"
254-
VIDEO_DIR = "dataset/origin/"
255-
NPY_DIR = "dataset/npy/"
256-
257-
# Frame sampling
258-
REDUCE_FPS_TO = 24.0
259-
FRAME_SKIP = 2
260-
ACCEPT_VIDEO_FPS_WITHIN = (24.0, 60.0)
261-
MAX_WORKERS = 4
262-
263-
# Keypoint selection (85 keypoints from COCO-WholeBody)
264-
COCO_WHOLEBODY_IDX = [5, 6, 7, 8, 11, 12, ...]
265-
266-
# Model paths
267-
POSE_MODEL_CHECKPOINT = "models/checkpoints/rtmw3d-l_..."
268-
DET_MODEL_CHECKPOINT = "models/checkpoints/rtmdet_m_..."
269-
270-
# Output format
271-
ADD_VISIBLE = True # Include visibility scores
272-
273-
# Inference parameters
274-
BBOX_THR = 0.5 # Person detection threshold
275-
KPT_THR = 0.3 # Keypoint confidence threshold
276-
```
277-
278-
---
279-
280-
## 🔄 Pipeline Stages
184+
## Pipeline Stages
281185

282186
### Stage 1: Data Acquisition (`1_download_data.py`)
283187

@@ -351,7 +255,7 @@ Extracts 3D pose landmarks using MMPose RTMPose3D.
351255

352256
---
353257

354-
## 📚 Dataset Information
258+
## Dataset Information
355259

356260
### YouTube-ASL Dataset
357261

@@ -394,155 +298,10 @@ Extracts 3D pose landmarks using MMPose RTMPose3D.
394298

395299
---
396300

397-
## 🔬 Advanced Usage
398-
399-
### Custom Landmark Selection
400-
401-
Edit landmark indices in config files to extract different keypoints:
402-
403-
```python
404-
# configs/extract_mediapipe.py
405-
406-
# Example: Extract only hands (no pose, no face)
407-
POSE_IDX = [] # Empty - skip pose
408-
FACE_IDX = [] # Empty - skip face
409-
HAND_IDX = list(range(21)) # All hand landmarks
410-
411-
# Output will be: 21 left + 21 right = 42 landmarks × 3 coords = 126 features
412-
```
413-
414-
### Adjust Frame Sampling
415-
416-
Control processing speed vs. temporal resolution:
417-
418-
```python
419-
# configs/extract_mediapipe.py
420-
421-
# Option 1: Fixed target FPS (recommended)
422-
REDUCE_FPS_TO = 15.0 # Downsample all videos to 15 FPS
423-
FRAME_SKIP = 1 # Not used when REDUCE_FPS_TO is set
424-
425-
# Option 2: Skip every Nth frame
426-
REDUCE_FPS_TO = None # Disable FPS reduction
427-
FRAME_SKIP = 3 # Sample every 3rd frame (1/3 rate)
428-
```
429-
430-
### Parallel Processing Tuning
431-
432-
Adjust worker count based on your hardware:
433-
434-
```python
435-
# configs/extract_mediapipe.py or extract_mmpose.py
436-
437-
# CPU-bound (MediaPipe)
438-
MAX_WORKERS = 4 # Typically CPU cores - 1
439-
440-
# GPU-bound (MMPose)
441-
MAX_WORKERS = 2 # Fewer workers due to GPU memory constraints
442-
```
443-
444-
### Filter Videos by FPS
445-
446-
Skip videos with unusual frame rates:
447-
448-
```python
449-
# configs/extract_mediapipe.py
450-
451-
# Only process videos between 24-60 FPS
452-
ACCEPT_VIDEO_FPS_WITHIN = (24.0, 60.0)
453-
454-
# Accept all frame rates
455-
ACCEPT_VIDEO_FPS_WITHIN = (1.0, 120.0)
456-
```
457-
458-
---
459-
460-
## 🛠️ Troubleshooting
461-
462-
### Common Issues
463-
464-
**1. Import Error: `cannot import name 'TooManyRequests'`**
465-
466-
Update youtube-transcript-api:
467-
```bash
468-
pip install --upgrade youtube-transcript-api
469-
```
470-
471-
**2. MMPose Model Not Found**
472-
473-
Download model checkpoints (see Installation section) or update paths in `configs/extract_mmpose.py`.
474-
475-
**3. CUDA Out of Memory (MMPose)**
476-
477-
Reduce `MAX_WORKERS` in `configs/extract_mmpose.py`:
478-
```python
479-
MAX_WORKERS = 1 # Process one video at a time
480-
```
481-
482-
**4. Video Download Fails**
483-
484-
Check if video is still available on YouTube. Update yt-dlp:
485-
```bash
486-
pip install --upgrade yt-dlp
487-
```
488-
489-
**5. Slow Processing**
490-
491-
- Enable FPS reduction: Set `REDUCE_FPS_TO = 15.0`
492-
- Increase `FRAME_SKIP` to sample fewer frames
493-
- Reduce `MAX_WORKERS` if system is overloaded
494-
495-
### Debug Mode
496-
497-
Enable detailed logging:
498-
499-
```bash
500-
# Add to scripts before running
501-
import logging
502-
logging.basicConfig(level=logging.DEBUG)
503-
```
504-
505-
### Validation
506-
507-
Check output landmark arrays:
508-
509-
```python
510-
import numpy as np
511-
512-
# Load landmark array
513-
landmarks = np.load("dataset/npy/video_id-001.npy")
514-
515-
print(f"Shape: {landmarks.shape}") # (T, 255) or (T, 340)
516-
print(f"Min: {landmarks.min():.3f}") # Should be ~-1 to 0
517-
print(f"Max: {landmarks.max():.3f}") # Should be ~1 to 2
518-
print(f"Mean: {landmarks.mean():.3f}") # Should be ~0 to 1
519-
print(f"Has NaN: {np.isnan(landmarks).any()}") # Should be False
520-
```
521-
522-
---
523-
524-
## 📄 License
301+
## License
525302

526303
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
527304

528305
---
529306

530-
## 🙏 Acknowledgments
531-
532-
- **YouTube-ASL Team** - For the dataset and methodology
533-
- **How2Sign Team** - For the How2Sign dataset
534-
- **MediaPipe Team** - For holistic body landmark detection
535-
- **MMPose Team** - For advanced 3D pose estimation
536-
- **OpenMMLab** - For the excellent computer vision framework
537-
538-
---
539-
540-
## 📞 Contact & Support
541-
542-
- **Issues**: [GitHub Issues](https://github.com/yourusername/ASL-Dataset-Preprocess/issues)
543-
- **Documentation**: See `REORGANIZATION_SUMMARY.md` for architecture details
544-
- **Contributing**: Pull requests welcome!
545-
546-
---
547307

548-
**Happy ASL Preprocessing! 🤟**
300 KB
Loading

assets/mmpose_extracted_sample.gif

370 KB
Loading

0 commit comments

Comments
 (0)