Skip to content

Conversation

Cadene
Copy link
Collaborator

@Cadene Cadene commented Apr 22, 2024

What does this PR do?

  • Convert datasets into video datasets
  • Add video decoding to LeRobotDataset while keeping backward compatibility
  • Benchmark available settings and selected: torchvision yuv444p

More information on metrics and settings available here.

Example of benchmark

Effect of compression parameter of ffmpeg video encoding (crf) on:

  • compression_factor (higher is better): for instance, compression_factor=4 means that the video takes 4 times less memory space on disk compared to the original images.
  • load_time_factor (higher is better): for instance, load_time_factor=0.5 means that decoding from video is 2 times slower than loading the original images.
  • avg_per_pixel_l2_error (lower is better): reconstruction error

Note: We selected default value for crf. In fact, as you can see bellow, crf=None allows to compress the image dataset by a factor 15 (15TB -> 1TB), while ensuring acceptable reconstruction error and acceptable loading time reduction (3 times slower).

Note: We also primarily benchmarked -g option which adjusts the amount of key frames. We found -g 2, for 1 key frame every 2 frames, to give the best tradeoff while ensuring high enough loading time:

  • 3 times slower for randomly loading 1 frame compared to loading from png images,
  • 2 times slower for loading 2 consecutive frames,
  • same speed for loading 6 consecutive frames.

Note: In this PR, we use pyav to decode on CPU. We want to explore faster ways to decode like:

  • multi-threading,
  • GPU decoding.
crf comp. load. avg_l2_error
Baseline 1.0 1.0 0.0000000 original_frame
0 1.918 0.165 0.0000056 0
5 3.207 0.171 0.0000111 5
10 4.818 0.212 0.0000153 10
15 7.329 0.261 0.0000218 15
20 11.361 0.312 0.0000317 20
None 14.932 0.339 0.0000397 None
25 17.741 0.297 0.0000452 25
30 27.983 0.406 0.0000629 30
40 82.449 0.468 0.0001184 40
50 186.145 0.515 0.0001879 50

How was it tested?

  • proper benchmark on sim and real-world datasets
  • reproduced results by retraining from scratch

Pusht

DATA_DIR=data python lerobot/scripts/train.py \
hydra.job.name=pusht_videos

python lerobot/scripts/train.py \
hydra.job.name=pusht_images

Similar success rate, training loss, reward, update time, GPU utilisation.
Screenshot 2024-05-02 at 01 32 12

Aloha

DATA_DIR=data python lerobot/scripts/train.py \
env=aloha \
dataset.repo_id=lerobot/aloha_sim_insertion_human \
policy=act \
hydra.job.name=aloha_sim_insertion_human_videos

python lerobot/scripts/train.py \
env=aloha \
dataset.repo_id=lerobot/aloha_sim_insertion_human \
policy=act \
hydra.job.name=aloha_sim_insertion_human_images
Screenshot 2024-05-02 at 14 47 58

Generate video datasets

python lerobot/scripts/push_dataset_to_hub.py \
--data-dir data \
--dataset-id pusht \
--raw-format pusht_zarr \
--community-id lerobot \
--revision v1.2 \
--dry-run 1 \
--save-to-disk 1 \
--save-tests-to-disk 1

python lerobot/scripts/push_dataset_to_hub.py \
--data-dir data \
--dataset-id xarm_lift_medium \
--raw-format xarm_pkl \
--community-id lerobot \
--revision v1.2 \
--dry-run 1 \
--save-to-disk 1 \
--save-tests-to-disk 1

python lerobot/scripts/push_dataset_to_hub.py \
--data-dir data \
--dataset-id aloha_sim_insertion_human \
--raw-format aloha_hdf5 \
--community-id lerobot \
--revision v1.2 \
--dry-run 1 \
--save-to-disk 1 \
--save-tests-to-disk 1

python lerobot/scripts/push_dataset_to_hub.py \
--data-dir data \
--dataset-id umi_cup_in_the_wild \
--raw-format umi_zarr \
--community-id lerobot \
--revision v1.2 \
--dry-run 1 \
--save-to-disk 1 \
--save-tests-to-disk 1

How to quickly test?

Add --debug 1 to previous push_dataset_to_hub.py commands.

DATA_DIR=tests/data python -m pytest tests/test_datasets.py::test_factory[pusht-lerobot/pusht-diffusion]

TODO

  • Video decoding on GPU:
    • decord is used in computer vision papers about videos, but last commit was 2 years ago
    • torchaudio also does GPU decoding through ffmpeg, but I couldnt make it work with yuv444p
    • PyNvCodec is from Nvidia and quite up-to-date. Also they have a nice example with pytorch.

@Cadene Cadene changed the base branch from main to user/rcadene/2024_04_21_refactor_dataset April 22, 2024 09:40
@aliberts aliberts added the dataset Issues regarding data inputs, processing, or datasets label Apr 24, 2024
@Cadene Cadene force-pushed the user/rcadene/2024_04_21_refactor_dataset branch from da8f4f0 to 7626b9a Compare April 25, 2024 07:52
@Cadene Cadene force-pushed the user/rcadene/2024_04_21_load_from_video branch from 9f841c3 to 388ae7c Compare April 25, 2024 09:53
@Cadene Cadene force-pushed the user/rcadene/2024_04_21_refactor_dataset branch from 7626b9a to 338b36e Compare April 25, 2024 09:59
Base automatically changed from user/rcadene/2024_04_21_refactor_dataset to main April 25, 2024 10:23
@Cadene Cadene force-pushed the user/rcadene/2024_04_21_load_from_video branch from 388ae7c to 832f1b3 Compare April 27, 2024 13:16
@Cadene Cadene self-assigned this Apr 28, 2024
@Cadene Cadene force-pushed the user/rcadene/2024_04_21_load_from_video branch 5 times, most recently from ee08eae to f2b3a0e Compare April 30, 2024 07:59
@Cadene Cadene force-pushed the user/rcadene/2024_04_21_load_from_video branch from f2b3a0e to 77047d8 Compare April 30, 2024 12:32
@Cadene Cadene marked this pull request as ready for review April 30, 2024 12:51
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great ! I just left two comments about the video keys. IMO a minor change could be helpful to ensure an easy support for videos in other libs and on HF

Copy link
Contributor

@AdilZouitine AdilZouitine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This is a very nice PR 📹. I have two minors comment and one suggestion 😄

Copy link
Contributor

@alexander-soare alexander-soare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to this landing! Sorry, I think I missed a lot of the action from a previous related PR. Just got nits and comments from me, but generally approved for the time being.

@Cadene Cadene force-pushed the user/rcadene/2024_04_21_load_from_video branch from 5d982ef to 88ff197 Compare April 30, 2024 17:41
@Cadene
Copy link
Collaborator Author

Cadene commented May 2, 2024

@alexander-soare @aliberts @AdilZouitine I actually misread the loading time metric. Using the default values of ffmpeg, which are made for movies, leads to extremely slow loading time (25 times slower than images). It makes compute_stats during dataset creation a huge bottleneck. Not acceptable.

I had to redo all benchmarks. I found that by adjusting the number of key frames to be "every 2 frames", we gets a 3 times slower loading time for totally random access of 1 frame, while insuring 15 times reduction in dataset size and acceptable reconstruction error.

Importantly, by retraining, I also validated that this reconstruction error doesnt affect success rate on pusht. I also validated that 3 times slower loading time doesnt affect training time (because we have enough workers).

For TDMPC, loading 6 consecutive frames shouldnt reduce loading time compared to loading png images.
If it is still too slow, we can:

  • preload all frames in RAM,
  • implement CUDA decoding <--- this require custom torchvision install.

@Cadene Cadene merged commit b2cda12 into main May 2, 2024
@Cadene Cadene deleted the user/rcadene/2024_04_21_load_from_video branch May 2, 2024 22:50
${MAKE} test-diffusion-ete-eval
${MAKE} test-tdmpc-ete-train
${MAKE} test-tdmpc-ete-eval
# TODO(rcadene, alexander-soare): enable end-to-end tests for tdmpc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it an oversight?

menhguin pushed a commit to menhguin/lerobot that referenced this pull request Feb 9, 2025
Kalcy-U referenced this pull request in Kalcy-U/lerobot May 13, 2025
ZoreAnuj pushed a commit to luckyrobots/lerobot that referenced this pull request Jul 29, 2025
Ricci084 pushed a commit to JeffWang987/lerobot that referenced this pull request Sep 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset Issues regarding data inputs, processing, or datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants