-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Download datasets from hugging face #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, otherwise LGTM
torchvision = "^0.17.1" | ||
h5py = "^3.10.0" | ||
dm-control = "1.0.14" | ||
huggingface-hub = {extras = ["hf-transfer"], version = "^0.21.4"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also add it to the CI env:
cd .github/poetry/cpu && poetry add "huggingface-hub[hf-transfer]"
(I know, this is going to be a pain until we have better CI pipelines)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!!!! @aliberts if you have cycles, might be good to add a test for this in the meantime (if possible).
Come to think of it, the CI passed because it uses the test artifacts stored by git lfs from the repo and so never needs to download the dataset. Should we have a light/mock version of the dataset to download for testing? Should we test this at all? Not sure what the best course of action would be here. |
…t in the pyproject.toml and will be skipped: - huggingface-hub If you want to update it to the latest compatible version, you can use `poetry update package`. If you prefer to upgrade it to the latest available version, you can use `poetry add package@latest`. Nothing to add.
There might be a utility in huggingface-cli but YOLO (for now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
lerobot/common/datasets/abstract.py
Outdated
self.data_dir = self.root / self.dataset_id | ||
|
||
storage = self._download_or_load_storage() | ||
self.root = root if root is None else Path(root) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would then suggest to remove the type hint root: Path = None
in the signature above but this could be work for a dedicated PR on type hinting later on, perhaps not urgent at the moment
EDIT: Nevermind! I didn't see your reply above when writing this.
(root: Path | None = None
would be the way to go then)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
…_hf_dataset Download datasets from hugging face
Download datasets from hugging face
We can quickly download and load our datasets with:
Datasets will be stored in
.cache
by default:Uploading / updating is easy. I added some info in README.
Datasets added:
Future work: