Skip to content

Add Secondary Caching for Extracted Datasets Files #1589

Closed
@Nayef211

Description

@Nayef211

🚀 Feature

We need to update our dataset implementations to add secondary caching for extracted files as a followup to #1494.

Motivation

Some of our datasets have a cache_compressed_dp and then a cache_decompressed_dp which is the behavior we want (i.e. EnWik9). On the other hand some of our other datasets only cache the downloaded archive file and not the files extracted from that archive (i.e. SogouNews).

Backlog of Dataset Tests

The following datasets need to be updated to add the secondary caching mechanism:

cc @parmeet @abhinavarora @VirgileHlav @erip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions