Skip to content

QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning PaLM with only five examples per language. We use the synthetic data to finetune downstream QA models leading to improved accuracy in comparison to English-only and translation-based baselines.

Notifications You must be signed in to change notification settings

sabyaghossh/QAmeleon

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

QAmeleon introduces synthetic multilingual QA data contaning in 8 langauges using PaLM-540B, a large language model. This dataset was generated by prompt tuning PaLM with only five examples per language. We use the synthetic data to finetune downstream QA models leading to improved accuracy in comparison to English-only and translation-based baselines.

Data available at https://storage.googleapis.com/qameleon/qamelon_pt_accepted.csv

More details can be found in the paper which can be cited as follows:

@misc{agrawal2022qameleon,
      title={QAmeleon: Multilingual QA with Only 5 Examples}, 
      author={Priyanka Agrawal and Chris Alberti and Fantine Huot and Joshua Maynez and Ji Ma and Sebastian Ruder and Kuzman Ganchev and Dipanjan Das and Mirella Lapata},
      year={2022},
      eprint={2211.08264},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

This dataset contains a total of 47173 Question Answer instances across 8 langauges, following is the count per language.

Language Count
ar 6966
bn 6084
fi 5028
id 6797
ko 6471
ru 5557
sw 5597
te 4673
Total 47173

The QAmeleon dataset is released under the CC-BY 4.0 license.

About

QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning PaLM with only five examples per language. We use the synthetic data to finetune downstream QA models leading to improved accuracy in comparison to English-only and translation-based baselines.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published