-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Refactor the download and publication of the datasets and convert it into CLI script #95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the download and publication of the datasets and convert it into CLI script #95
Conversation
28d02c6
to
77863e9
Compare
…push dataset to hub
fea1a60
to
112f9e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the avalanche of comments and suggestions, but they are mostly minor.
This PR is so important. Huge congrats for reaching this state. It's almost done.
dry_run (bool, optional): If True, performs a dry run without actually pushing the dataset. Defaults to False. | ||
revision (str, optional): The revision of the dataset. Defaults to "v1.0". | ||
community_id (str, optional): The ID of the community. Defaults to "lerobot". | ||
preprocess (bool, optional): If True, preprocesses the dataset. Defaults to True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we should keep preprocess
argument. Could be worth simplifying.
preprocess (bool, optional): If True, preprocesses the dataset. Defaults to True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also asked myself this question, what we can do is rename this flag to --no-preprocess
and by default processing is done.
The flag is only interesting for users who want the raw dataset, which will be a small percentage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AdilZouitine for those users, they can comment the preprocess code lines ^^
I think it's safe to remove preprocess
argument
Co-authored-by: Remi <[email protected]>
Co-authored-by: Remi <[email protected]>
Co-authored-by: Remi <[email protected]>
Co-authored-by: Remi <[email protected]>
Co-authored-by: Remi <[email protected]>
Co-authored-by: Remi <[email protected]>
Co-authored-by: Remi <[email protected]>
Co-authored-by: Remi <[email protected]>
Co-authored-by: Remi <[email protected]>
Co-authored-by: Remi <[email protected]>
…nctions inside push_dataset_to_hub
available_datasets_without_env = ["lerobot/umi_cup_in_the_wild"] | ||
|
||
available_datasets = list( | ||
itertools.chain(*available_datasets_per_env.values(), available_datasets_without_env) | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, a TODO should be fine
…into CLI script (huggingface#95) Co-authored-by: Remi <[email protected]>
…into CLI script (#95) Co-authored-by: Remi <[email protected]>
…into CLI script (huggingface#95) Co-authored-by: Remi <[email protected]>
…into CLI script (huggingface#95) Co-authored-by: Remi <[email protected]>
What does this PR do?
Following discussions with @Cadene, this PR has two main objectives:
Aloha,
Umi,
Pusht,
orXarm
formats.I tested this PR with
tests/test_datasets.py::test_backward_compatibility
andDATA_DIR="tmp/data/path/save/to/disk" pytest -sx tests/test_datasets.py::test_backward_compatibility