Skip to content

Add support for all datasets of the GLUE benchmark #1710

Closed
@vcm2114

Description

@vcm2114

🚀 Feature

Add support for all 8 remaining datasets (SST-2 is already supported) of the GLUE benchmark: CoLA, MRPC, QQP, STS-B, MNLI, QNLI, RTE, WNLI.

Motivation

In itself adding support for all GLUE datasets has a lot of value: GLUE is one of the most widely used benchmark in the NLP community. Furthermore, our planned effort to develop a cohesive API for Multi-Task Learning requires us to enhance our suite our datasets, starting with GLUE.

Additional context

We have already created a streamlined dataset API, with a consistent use of DataPipes for dataset download and load operations, as well as a testing methodology relying on mock data (see #1493). This feature will only require to add support for these 8 datasets following that methodology.

Datasets

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions