Add video decoding to LeRobotDataset #92

Cadene · 2024-04-22T09:40:04Z

What does this PR do?

Convert datasets into video datasets
Add video decoding to LeRobotDataset while keeping backward compatibility
Benchmark available settings and selected: torchvision yuv444p

More information on metrics and settings available here.

Example of benchmark

Effect of compression parameter of ffmpeg video encoding (crf) on:

compression_factor (higher is better): for instance, compression_factor=4 means that the video takes 4 times less memory space on disk compared to the original images.
load_time_factor (higher is better): for instance, load_time_factor=0.5 means that decoding from video is 2 times slower than loading the original images.
avg_per_pixel_l2_error (lower is better): reconstruction error

Note: We selected default value for crf. In fact, as you can see bellow, crf=None allows to compress the image dataset by a factor 15 (15TB -> 1TB), while ensuring acceptable reconstruction error and acceptable loading time reduction (3 times slower).

Note: We also primarily benchmarked -g option which adjusts the amount of key frames. We found -g 2, for 1 key frame every 2 frames, to give the best tradeoff while ensuring high enough loading time:

3 times slower for randomly loading 1 frame compared to loading from png images,
2 times slower for loading 2 consecutive frames,
same speed for loading 6 consecutive frames.

Note: In this PR, we use pyav to decode on CPU. We want to explore faster ways to decode like:

multi-threading,
GPU decoding.

`crf`	`comp.`	`load.`	`avg_l2_error`
Baseline	1.0	1.0	0.0000000
0	1.918	0.165	0.0000056
5	3.207	0.171	0.0000111
10	4.818	0.212	0.0000153
15	7.329	0.261	0.0000218
20	11.361	0.312	0.0000317
None	14.932	0.339	0.0000397
25	17.741	0.297	0.0000452
30	27.983	0.406	0.0000629
40	82.449	0.468	0.0001184
50	186.145	0.515	0.0001879

How was it tested?

proper benchmark on sim and real-world datasets
reproduced results by retraining from scratch

Pusht

DATA_DIR=data python lerobot/scripts/train.py \
hydra.job.name=pusht_videos

python lerobot/scripts/train.py \
hydra.job.name=pusht_images

Similar success rate, training loss, reward, update time, GPU utilisation.

Aloha

DATA_DIR=data python lerobot/scripts/train.py \
env=aloha \
dataset.repo_id=lerobot/aloha_sim_insertion_human \
policy=act \
hydra.job.name=aloha_sim_insertion_human_videos

python lerobot/scripts/train.py \
env=aloha \
dataset.repo_id=lerobot/aloha_sim_insertion_human \
policy=act \
hydra.job.name=aloha_sim_insertion_human_images

Generate video datasets

python lerobot/scripts/push_dataset_to_hub.py \
--data-dir data \
--dataset-id pusht \
--raw-format pusht_zarr \
--community-id lerobot \
--revision v1.2 \
--dry-run 1 \
--save-to-disk 1 \
--save-tests-to-disk 1

python lerobot/scripts/push_dataset_to_hub.py \
--data-dir data \
--dataset-id xarm_lift_medium \
--raw-format xarm_pkl \
--community-id lerobot \
--revision v1.2 \
--dry-run 1 \
--save-to-disk 1 \
--save-tests-to-disk 1

python lerobot/scripts/push_dataset_to_hub.py \
--data-dir data \
--dataset-id aloha_sim_insertion_human \
--raw-format aloha_hdf5 \
--community-id lerobot \
--revision v1.2 \
--dry-run 1 \
--save-to-disk 1 \
--save-tests-to-disk 1

python lerobot/scripts/push_dataset_to_hub.py \
--data-dir data \
--dataset-id umi_cup_in_the_wild \
--raw-format umi_zarr \
--community-id lerobot \
--revision v1.2 \
--dry-run 1 \
--save-to-disk 1 \
--save-tests-to-disk 1

How to quickly test?

Add --debug 1 to previous push_dataset_to_hub.py commands.

DATA_DIR=tests/data python -m pytest tests/test_datasets.py::test_factory[pusht-lerobot/pusht-diffusion]

TODO

Video decoding on GPU:
- decord is used in computer vision papers about videos, but last commit was 2 years ago
- torchaudio also does GPU decoding through ffmpeg, but I couldnt make it work with yuv444p
- PyNvCodec is from Nvidia and quite up-to-date. Also they have a nice example with pytorch.

lhoestq

Great ! I just left two comments about the video keys. IMO a minor change could be helpful to ensure an easy support for videos in other libs and on HF

lerobot/common/datasets/video_utils.py

AdilZouitine

LGTM! This is a very nice PR 📹. I have two minors comment and one suggestion 😄

lerobot/common/datasets/_video_benchmark/README.md

lerobot/common/datasets/_video_benchmark/_video_utils.py

lerobot/common/datasets/video_utils.py

alexander-soare

Looking forward to this landing! Sorry, I think I missed a lot of the action from a previous related PR. Just got nits and comments from me, but generally approved for the time being.

lerobot/common/datasets/_video_benchmark/_video_utils.py

lerobot/common/datasets/_video_benchmark/README.md

lerobot/common/datasets/lerobot_dataset.py

lerobot/common/datasets/video_utils.py

…ition to Alexander

lerobot/common/datasets/_video_benchmark/_video_utils.py

lerobot/common/datasets/lerobot_dataset.py

lerobot/common/datasets/video_utils.py

Cadene · 2024-05-02T00:07:21Z

@alexander-soare @aliberts @AdilZouitine I actually misread the loading time metric. Using the default values of ffmpeg, which are made for movies, leads to extremely slow loading time (25 times slower than images). It makes compute_stats during dataset creation a huge bottleneck. Not acceptable.

I had to redo all benchmarks. I found that by adjusting the number of key frames to be "every 2 frames", we gets a 3 times slower loading time for totally random access of 1 frame, while insuring 15 times reduction in dataset size and acceptable reconstruction error.

Importantly, by retraining, I also validated that this reconstruction error doesnt affect success rate on pusht. I also validated that 3 times slower loading time doesnt affect training time (because we have enough workers).

For TDMPC, loading 6 consecutive frames shouldnt reduce loading time compared to loading png images.
If it is still too slow, we can:

preload all frames in RAM,
implement CUDA decoding <--- this require custom torchvision install.

lerobot/common/datasets/lerobot_dataset.py

qgallouedec · 2024-05-03T05:53:07Z

Makefile

 	${MAKE} test-diffusion-ete-eval
-	${MAKE} test-tdmpc-ete-train
-	${MAKE} test-tdmpc-ete-eval
+	# TODO(rcadene, alexander-soare): enable end-to-end tests for tdmpc


Is it an oversight?

Cadene changed the base branch from main to user/rcadene/2024_04_21_refactor_dataset April 22, 2024 09:40

aliberts added the dataset Issues regarding data inputs, processing, or datasets label Apr 24, 2024

aliberts mentioned this pull request Apr 24, 2024

[WIP] Add video dataset (instead of uint8) #27

Closed

Cadene force-pushed the user/rcadene/2024_04_21_refactor_dataset branch from da8f4f0 to 7626b9a Compare April 25, 2024 07:52

Cadene force-pushed the user/rcadene/2024_04_21_load_from_video branch from 9f841c3 to 388ae7c Compare April 25, 2024 09:53

Cadene force-pushed the user/rcadene/2024_04_21_refactor_dataset branch from 7626b9a to 338b36e Compare April 25, 2024 09:59

Base automatically changed from user/rcadene/2024_04_21_refactor_dataset to main April 25, 2024 10:23

Cadene force-pushed the user/rcadene/2024_04_21_load_from_video branch from 388ae7c to 832f1b3 Compare April 27, 2024 13:16

Cadene self-assigned this Apr 28, 2024

Cadene force-pushed the user/rcadene/2024_04_21_load_from_video branch 5 times, most recently from ee08eae to f2b3a0e Compare April 30, 2024 07:59

Add video dataset by default

77047d8

Cadene force-pushed the user/rcadene/2024_04_21_load_from_video branch from f2b3a0e to 77047d8 Compare April 30, 2024 12:32

rm noqa

f1935d9

Cadene marked this pull request as ready for review April 30, 2024 12:51

Cadene requested review from alexander-soare, AdilZouitine and lhoestq April 30, 2024 12:51

lhoestq reviewed Apr 30, 2024

View reviewed changes

lerobot/common/datasets/video_utils.py Outdated Show resolved Hide resolved

lerobot/common/datasets/video_utils.py Outdated Show resolved Hide resolved

AdilZouitine approved these changes Apr 30, 2024

View reviewed changes

lerobot/common/datasets/_video_benchmark/README.md Show resolved Hide resolved

lerobot/common/datasets/_video_benchmark/_video_utils.py Outdated Show resolved Hide resolved

AdilZouitine reviewed Apr 30, 2024

View reviewed changes

lerobot/common/datasets/video_utils.py Outdated Show resolved Hide resolved

alexander-soare approved these changes Apr 30, 2024

View reviewed changes

WIP

88ff197

Cadene force-pushed the user/rcadene/2024_04_21_load_from_video branch from 5d982ef to 88ff197 Compare April 30, 2024 17:41

Cadene added 3 commits May 1, 2024 13:53

Add closest timestamp matching

69acb6d

WIP chasing segmentation fault

cdaaa16

fix segmentation fault and buy a yamazaki 2012 whiskey of the year ed…

a00102b

…ition to Alexander

Cadene commented May 1, 2024

View reviewed changes

Address comments

63cf6fa

alexander-soare reviewed May 2, 2024

View reviewed changes

lerobot/common/datasets/lerobot_dataset.py Outdated Show resolved Hide resolved

Cadene added 19 commits May 2, 2024 12:21

Fix push_to_hub, Add batch_size and num_workers in CLI

3522e98

fix upload with upload_folder

c1512fd

Upload artifacts

0451700

revert tolerance

e3a7522

fix

41394c8

Merge branch 'main' into user/rcadene/2024_04_21_load_from_video

ae44510

fix tests, add videos to artifacts tests

57846be

fix

d03c928

fix unit test

3c963c0

fix more tests

c6693e2

small adjustment to not risk video loader segmentation fault

38cf61e

small

7cbd1a6

try lowering number of workers

e84db37

skip

a4f14f3

skip

abebfb7

try

dab8539

try

9e47da8

try

20d9332

try

4c13728

Cadene merged commit b2cda12 into main May 2, 2024

Cadene deleted the user/rcadene/2024_04_21_load_from_video branch May 2, 2024 22:50

qgallouedec reviewed May 3, 2024

View reviewed changes

menhguin pushed a commit to menhguin/lerobot that referenced this pull request Feb 9, 2025

Add video decoding to LeRobotDataset (huggingface#92)

cfec45e

Kalcy-U referenced this pull request in Kalcy-U/lerobot May 13, 2025

Add video decoding to LeRobotDataset (#92)

22eed3c

ZoreAnuj pushed a commit to luckyrobots/lerobot that referenced this pull request Jul 29, 2025

Add video decoding to LeRobotDataset (huggingface#92)

ea8a98f

Ricci084 pushed a commit to JeffWang987/lerobot that referenced this pull request Sep 5, 2025

Add video decoding to LeRobotDataset (huggingface#92)

e425048

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add video decoding to LeRobotDataset #92

Add video decoding to LeRobotDataset #92

Uh oh!

Cadene commented Apr 22, 2024 •

edited

Loading

Uh oh!

lhoestq left a comment

Uh oh!

Uh oh!

Uh oh!

AdilZouitine left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexander-soare left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Cadene commented May 2, 2024

Uh oh!

Uh oh!

qgallouedec May 3, 2024

Uh oh!

Uh oh!

Add video decoding to LeRobotDataset #92

Add video decoding to LeRobotDataset #92

Uh oh!

Conversation

Cadene commented Apr 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

How was it tested?

How to quickly test?

TODO

Uh oh!

lhoestq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AdilZouitine left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexander-soare left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Cadene commented May 2, 2024

Uh oh!

Uh oh!

qgallouedec May 3, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Cadene commented Apr 22, 2024 •

edited

Loading

AdilZouitine left a comment •

edited

Loading