CausalVideoAnnotator
is designed to support researchers and practitioners in labeling videos with causal structures, beyond conventional frame-level tags. It enables annotators to capture attibutes and affordances by linking objects in a structured format. The toolkit is tailored for building high-quality datasets for causal reasoning, vision-language-action models, and embodied AI research. This toolkit was initiated by Yushun Xiang and Dingxiang Luo.
First, download the required model checkpoints. SAM-2.1 (or SAM-2) checkpoints can be obtained by running:
chmod +x hfd.sh
./hfd.sh facebook/sam2.1-hiera-large --local-dir facebook/sam2.1-hiera-large
Tips:
- To use the Huggingface Model Downloader, make sure aria2 is installed beforehand.
Create a directory for your videos and place the files to be annotated inside:
mkdir videos
Then move the videos you want to annotate into the videos folder.
This project use uv to manage the Python virtual environment.
uv sync
uv run backend.py
https://localhost:8000
backend.py
: FastAPI app entrypoint; serves/static
and API.src/
: Python modules (sam2_pipeline.py
,utils.py
, components, views).static/
: Frontend assets (index.html
,*.js
,styles.css
).videos/
andannotations/
: Created at runtime; local storage for media and results. Place videos undervideos/<task>/<view>/file.mp4
.- Other helpers:
transfer_video_files.py
,utils.md
,tools.txt
.
- Create env (Python 3.10+):
python -m venv .venv && source .venv/bin/activate
- Install deps (minimal):
pip install fastapi uvicorn opencv-python numpy torch torchvision Pillow python-multipart
- Optional models/tools: SAM2/ByteTrack as noted in
README.md
.
- Optional models/tools: SAM2/ByteTrack as noted in
- Run backend:
python backend.py
(serves athttp://localhost:8000
, static at/static
). - Alternative run:
uvicorn backend:app --reload
(from repo root).
- Python: PEP 8, 4‑space indent, type hints where practical. Keep functions small and pure in
src/
. - JavaScript: camelCase for identifiers; keep files lowercase with hyphens when multiword (e.g.,
file-browser.js
). - Paths: use
Path
/os.path
and never assume absolute paths undervideos/
orannotations/
. - Docstrings/comments: concise, explain why over what.
- Framework: pytest (suggested). Place tests under
tests/
, nametest_*.py
. - Quick run:
pytest -q
(afterpip install pytest
). - Aim for unit tests around
src/utils.py
and integration tests hitting FastAPI routes withhttpx
/TestClient
.
- Commits: imperative mood, concise subject. Prefer prefix tags when relevant:
[feat]
,[fix]
,[chore]
. Keep one logical change per commit. - PRs: include summary, rationale, screenshots for UI changes, and reproduction steps. Link related issues and list test coverage or manual checks.
- Large models (SAM/trackers) are optional but GPU‑recommended; document your setup.
- Backend enforces listing under
videos/
; keep inputs within that tree to avoid traversal. - Do not commit large media or model weights; use
.gitignore
and external storage.