-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Add video decoding to LeRobotDataset #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
da8f4f0
to
7626b9a
Compare
9f841c3
to
388ae7c
Compare
7626b9a
to
338b36e
Compare
388ae7c
to
832f1b3
Compare
ee08eae
to
f2b3a0e
Compare
f2b3a0e
to
77047d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great ! I just left two comments about the video keys. IMO a minor change could be helpful to ensure an easy support for videos in other libs and on HF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This is a very nice PR 📹. I have two minors comment and one suggestion 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward to this landing! Sorry, I think I missed a lot of the action from a previous related PR. Just got nits and comments from me, but generally approved for the time being.
5d982ef
to
88ff197
Compare
@alexander-soare @aliberts @AdilZouitine I actually misread the loading time metric. Using the default values of ffmpeg, which are made for movies, leads to extremely slow loading time (25 times slower than images). It makes I had to redo all benchmarks. I found that by adjusting the number of key frames to be "every 2 frames", we gets a 3 times slower loading time for totally random access of 1 frame, while insuring 15 times reduction in dataset size and acceptable reconstruction error. Importantly, by retraining, I also validated that this reconstruction error doesnt affect success rate on pusht. I also validated that 3 times slower loading time doesnt affect training time (because we have enough workers). For TDMPC, loading 6 consecutive frames shouldnt reduce loading time compared to loading png images.
|
${MAKE} test-diffusion-ete-eval | ||
${MAKE} test-tdmpc-ete-train | ||
${MAKE} test-tdmpc-ete-eval | ||
# TODO(rcadene, alexander-soare): enable end-to-end tests for tdmpc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it an oversight?
What does this PR do?
LeRobotDataset
while keeping backward compatibilitytorchvision yuv444p
More information on metrics and settings available here.
Example of benchmark
Effect of compression parameter of
ffmpeg
video encoding (crf
) on:compression_factor
(higher is better): for instance,compression_factor=4
means that the video takes 4 times less memory space on disk compared to the original images.load_time_factor
(higher is better): for instance,load_time_factor=0.5
means that decoding from video is 2 times slower than loading the original images.avg_per_pixel_l2_error
(lower is better): reconstruction errorNote: We selected default value for
crf
. In fact, as you can see bellow,crf=None
allows to compress the image dataset by a factor 15 (15TB -> 1TB), while ensuring acceptable reconstruction error and acceptable loading time reduction (3 times slower).Note: We also primarily benchmarked
-g
option which adjusts the amount of key frames. We found-g 2
, for 1 key frame every 2 frames, to give the best tradeoff while ensuring high enough loading time:Note: In this PR, we use
pyav
to decode on CPU. We want to explore faster ways to decode like:crf
comp.
load.
avg_l2_error
How was it tested?
Pusht
Similar success rate, training loss, reward, update time, GPU utilisation.

Aloha
Generate video datasets
How to quickly test?
Add
--debug 1
to previouspush_dataset_to_hub.py
commands.TODO
yuv444p