Closed
Description
🚀 Feature
We need to update our dataset implementations to add secondary caching for extracted files as a followup to #1494.
Motivation
Some of our datasets have a cache_compressed_dp
and then a cache_decompressed_dp
which is the behavior we want (i.e. EnWik9). On the other hand some of our other datasets only cache the downloaded archive file and not the files extracted from that archive (i.e. SogouNews).
Backlog of Dataset Tests
The following datasets need to be updated to add the secondary caching mechanism:
- SogouNews Adding secondary caching to datasets #1594
- AmazonReviewFull Adding secondary caching to datasets #1594
- CoNLL2000Chunking Updating Conll2000Chunking dataset to be consistent with other datasets #1590
- DBpedia Added caching to extracted files in DBPedia #1571
cc @parmeet @abhinavarora @VirgileHlav @erip
Metadata
Metadata
Assignees
Labels
No labels