Closed
Description
🚀 Feature
Add support for all 8 remaining datasets (SST-2 is already supported) of the GLUE benchmark: CoLA, MRPC, QQP, STS-B, MNLI, QNLI, RTE, WNLI.
Motivation
In itself adding support for all GLUE datasets has a lot of value: GLUE is one of the most widely used benchmark in the NLP community. Furthermore, our planned effort to develop a cohesive API for Multi-Task Learning requires us to enhance our suite our datasets, starting with GLUE.
Additional context
We have already created a streamlined dataset API, with a consistent use of DataPipes for dataset download and load operations, as well as a testing methodology relying on mock data (see #1493). This feature will only require to add support for these 8 datasets following that methodology.
Datasets
- CoLA Add support for CoLA dataset with unit tests #1711
- MRPC Add support for MRPC dataset with unit tests #1712
- QQP Add support for QQP dataset with unit tests #1713
- STS-B Add support for STS-B dataset with unit tests #1714
- MNLI Add support for MNLI dataset with unit tests #1715
- QNLI Add support for QNLI dataset with unit tests #1717
- RTE Add support for RTE dataset with unit tests #1721
- WNLI Add support for WNLI dataset with unit tests #1724
Metadata
Metadata
Assignees
Labels
No labels