Skip to content

Commit b3684be

Browse files
committed
Reorganize project into modular architecture with separate configs and reusable components
## New Structure - configs/ - Script-specific configuration files (4 files) - download.py - YouTube download settings - build_manifest.py - Transcript processing config - extract_mediapipe.py - MediaPipe extraction config - extract_mmpose.py - MMPose 3D extraction config - src/asl_prep/ - Core library modules - common/ - Shared utilities (files.py, video.py) - download/ - YouTube download logic - transcripts/ - Text preprocessing - pipeline/ - Task orchestration - extractors/ - Landmark extraction (base, mediapipe, mmpose) - scripts/ - Executable entry points - 1_download_data.py - Download videos & transcripts - 2_build_manifest.py - Build manifest CSV - 3a_extract_mediapipe.py - Extract MediaPipe landmarks - 3b_extract_mmpose.py - Extract MMPose 3D landmarks - assets/ - Demo files and metadata (moved from resource/) ## Key Improvements - Eliminated ~200 lines of duplicate code (FPSSampler, file utils, validation) - Separated concerns with script-specific configs instead of monolithic conf.py - Created reusable modules following DRY principle - Added comprehensive documentation with type hints and docstrings - Implemented abstract base class for landmark extractors - Improved error handling and logging throughout ## Breaking Changes - Old scripts (s1_*.py, s2_*.py, s3_*.py) replaced with new scripts/ directory - conf.py split into 4 separate config files in configs/ - resource/ directory renamed to assets/ ## Files Changed - Created: 23 Python modules + 8 __init__.py files (~2,500 lines) - Updated: .gitignore, README.md - Moved: resource/ → assets/
1 parent f843fb6 commit b3684be

33 files changed

+625449
-1641
lines changed

.gitignore

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,20 @@ cython_debug/
163163
# Claude
164164
.claude/
165165

166-
# data
167-
dataset
166+
# ============================================================================
167+
# ASL Dataset Preprocessing - Project-Specific Ignores
168+
# ============================================================================
169+
170+
# Dataset directories (ignore content, keep structure)
171+
dataset/
172+
173+
# Model checkpoints (large files, download separately)
174+
models/checkpoints/*.pth
175+
176+
# Test notebooks and temporary scripts
177+
test.py
178+
179+
# Temporary files
180+
*.tmp
181+
.DS_Store
168182

0 commit comments

Comments
 (0)