Skip to content

GhanaNLP/ghanaian-nlp-datasets-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

Ghanaian NLP Datasets & Models

This repository curates NLP datasets and models for languages spoken in Ghana. It is maintained by Ghana NLP, with the goal of supporting research and development of natural language processing for all Ghanaian languages.

Join Ghana NLP to stay involved.


Table of Contents

Click to jump to any language:

Languages with Data


Akan (aka)

📁 Datasets

Name Description Link
Twi-English Parallel Sentences Twi and English Aligned translation pairs View
Fante Speech Transcribed Transcribed multi-speaker speech dataset View
Twi Transcribed Asante Twi Bible single speaker transcribed dataset (split at verse level) View
Twi Transcribed Asante Twi Bible single speaker transcribed dataset (split at utterance level) View

🤖 Models

Name Description Link
ABENA BERT model for Asante Twi - cased1, uncased2 and distilled uncased3, and Akuapem Twi - cased4 . 1 | 2 | 3 | 4
Akan Whisper Speech recognition model for Akan View
Asante Twi Speech Recognition Speech recognition and transcription model for Asante Twi View

Dagbani (dag)

📁 Datasets

Name Description Link
Dagbani Orthography Spelling guide corpus View

🤖 Models

Name Task Framework Link
DagBERT Language modeling Transformers GitHub

Ga (gaa)

📁 Datasets

(No entries yet — contribute!)

🤖 Models

(No entries yet — contribute!)


Languages without Data (Awaiting Contributions)

  • Abron (abr) (No data)
  • Adamorobe Sign Language (ads) (No data)
  • Adangbe (adq) (No data)
  • Adele (ade) (No data)
  • Ahanta (aha) (No data)
  • Akposo (kpo) (No data)
  • Animere (anf) (No data)
  • Anufo (cko) (No data)
  • Anyin (any) (No data)
  • Avatime (avn) (No data)
  • Awutu (afu) (No data)
  • Bimoba (bim) (No data)
  • Bisa (bib) (No data)
  • Bondoukou Kulango (kzc) (No data)
  • Boro (xxb) (No data)
  • Buli (bwu) (No data)
  • Chakali (cli) (No data)
  • Chala (cll) (No data)
  • Cherepon (cpn) (No data)
  • Chumburung (ncu) (No data)
  • Dangme (ada) (No data)
  • Deg (mzw) (No data)
  • Delo (ntr) (No data)
  • Dompo (doy) (No data)
  • Dwang (nnu) (No data)
  • Esahie (sfw) (No data)
  • Ewe (ewe) (No data)
  • Farefare (gur) (No data)
  • Ga (gaa) (No data)
  • Ghanaian Pidgin English (gpe) (No data)
  • Ghanaian Sign Language (gse) (No data)
  • Gikyode (acd) (No data)
  • Gonja (gjn) (No data)
  • Gua (gwx) (No data)
  • Hanga (hag) (No data)
  • Jwira-Pepesa (jwi) (No data)
  • Kamara (jmr) (No data)
  • Kantosi (xkt) (No data)
  • Kasem (xsm) (No data)
  • Konkomba (xon) (No data)
  • Konni (kma) (No data)
  • Kplang (kph) (No data)
  • Krache (kye) (No data)
  • Kusaal (kus) (No data)
  • Larteh (lar) (No data)
  • Lelemi (lef) (No data)
  • Ligbi (lig) (No data)
  • Logba (lgq) (No data)
  • Mampruli (maw) (No data)
  • Nafaanra (nfr) (No data)
  • Nawuri (naw) (No data)
  • Nchumbulu (nlu) (No data)
  • Nkami (nkq) (No data)
  • Nkonya (nko) (No data)
  • Ntcham (bud) (No data)
  • Nyagbo (nyb) (No data)
  • Nzema (nzi) (No data)
  • Paasaal (sig) (No data)
  • Safaliba (saf) (No data)
  • Sekpele (lip) (No data)
  • Selee (snw) (No data)
  • Siwu (akp) (No data)
  • Southern Birifor (biv) (No data)
  • Southern Dagaare (dga) (No data)
  • Tafi (tcd) (No data)
  • Tampulma (tpm) (No data)
  • Tumulung Sisaala (sil) (No data)
  • Tuwuli (bov) (No data)
  • Vagla (vag) (No data)
  • Wali (wlx) (No data)
  • Wasa (wss) (No data)
  • Western Sisaala (ssl) (No data)

🤝 Contributing

  1. Fork the repo
  2. Add your dataset or model under the correct language section
  3. Submit a pull request with a clear description

Or open an issue to suggest links or ask questions.


🔗 Join Ghana NLP

We’re a community of researchers and developers building NLP tools for Ghanaian languages.
👉 Join us here


📄 License

Each dataset or model has its own license. Check links or contact the maintainers for reuse conditions.

Maintained with 💛 by Ghana NLP

About

This repository curates NLP datasets and models for languages spoken in Ghana.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published