Stars
NusaWrites is an in-depth analysis of corpora collection strategy and a comprehensive language modeling benchmark for underrepresented and extremely low-resource Indonesian local languages.
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Crosslingual Generalization through Multitask Finetuning
Dataset from the paper "Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering" (COLING 2022)
A collaborative project to collect datasets in Indonesian languages.
High-quality parallel resource on sentiment analysis for 10 low-resource Indonesian languages, English, and Indonesian (Outstanding Paper at EACL 2023)
🐤 Nix-TTS: Lightweight and End-to-end Text-to-Speech via Module-wise Distillation
Welcome to our repository! This repository hosts the data on "IndoCollex: A Testbed for Morphological Transformation of Indonesian Word Colloquialism" Research Paper published on ACL-IJCNLP 2021. …
Implementation of "Semi-Supervised Low-Resource Style Transfer of Indonesian Informal to Formal Language with Iterative Forward-Translation".
Benchmarking Multidomain English-Indonesian Machine Translation
Materi TLX Training Gate, dalam Bahasa Indonesia
Test case generation framework for competitive programming problems