|
53 | 53 |
|
54 | 54 | ## Timeline of LLMs
|
55 | 55 |
|
56 |
| - |
| 56 | + |
57 | 57 |
|
58 | 58 | ## List of LLMs
|
59 | 59 |
|
|
71 | 71 | </thead>
|
72 | 72 | <tbody>
|
73 | 73 | <tr>
|
74 |
| - <td class="tg-nrix" align="center" rowspan="22">Publicly <br>Accessbile</td> |
| 74 | + <td class="tg-nrix" align="center" rowspan="25">Publicly <br>Accessbile</td> |
75 | 75 | <td class="tg-baqh" align="center">T5</td>
|
76 | 76 | <td class="tg-0lax" align="center">2019/10</td>
|
77 | 77 | <td class="tg-baqh" align="center">11</td>
|
|
131 | 131 | <td class="tg-baqh" align="center">175</td>
|
132 | 132 | <td class="tg-0lax" align="center"><a href="https://arxiv.org/abs/2205.01068">Paper</a></td>
|
133 | 133 | </tr>
|
| 134 | + <tr> |
| 135 | + <td class="tg-baqh" align="center">YaLM</td> |
| 136 | + <td class="tg-0lax" align="center">2022/06</td> |
| 137 | + <td class="tg-baqh" align="center">100</td> |
| 138 | + <td class="tg-0lax" align="center"><a href="https://github.com/yandex/YaLM-100B">Github</a></td> |
| 139 | + </tr> |
134 | 140 | <tr>
|
135 | 141 | <td class="tg-baqh" align="center">NLLB</td>
|
136 | 142 | <td class="tg-0lax" align="center">2022/07</td>
|
|
197 | 203 | <td class="tg-baqh" align="center">13</td>
|
198 | 204 | <td class="tg-0lax" align="center"><a href="https://vicuna.lmsys.org/">Blog</a></td>
|
199 | 205 | </tr>
|
| 206 | + <tr> |
| 207 | + <td class="tg-baqh" align="center">ChatGLM</td> |
| 208 | + <td class="tg-0lax" align="center">2023/03</td> |
| 209 | + <td class="tg-baqh" align="center">6</td> |
| 210 | + <td class="tg-0lax" align="center"><a href="https://github.com/THUDM/ChatGLM-6B">Github</a></td> |
| 211 | + </tr> |
| 212 | + <tr> |
| 213 | + <td class="tg-baqh" align="center">CodeGeeX</td> |
| 214 | + <td class="tg-0lax" align="center">2023/03</td> |
| 215 | + <td class="tg-baqh" align="center">13</td> |
| 216 | + <td class="tg-0lax" align="center"><a href="https://arxiv.org/abs/2303.17568">Paper</a></td> |
| 217 | + </tr> |
200 | 218 | <tr>
|
201 | 219 | <td class="tg-baqh" align="center">Koala</td>
|
202 | 220 | <td class="tg-0lax" align="center">2023/04</td>
|
|
323 | 341 | <td class="tg-baqh" align="center">54</td>
|
324 | 342 | <td class="tg-0lax" align="center"><a href="https://cohere.ai/">Homepage</a></td>
|
325 | 343 | </tr>
|
326 |
| - <tr> |
327 |
| - <td class="tg-baqh" align="center">YaLM</td> |
328 |
| - <td class="tg-0lax" align="center">2022/06</td> |
329 |
| - <td class="tg-baqh" align="center">100</td> |
330 |
| - <td class="tg-0lax" align="center"><a href="https://github.com/yandex/YaLM-100B">Github</a></td> |
331 |
| - </tr> |
332 | 344 | <tr>
|
333 | 345 | <td class="tg-baqh" align="center">AlexaTM</td>
|
334 | 346 | <td class="tg-0lax" align="center">2022/08</td>
|
|
395 | 407 |
|
396 | 408 |
|
397 | 409 |
|
| 410 | + |
398 | 411 | ## Resources of LLMs
|
399 | 412 |
|
400 | 413 | ### Publicly Available Models
|
|
412 | 425 | 11. <u>NLLB</u>: **"No Language Left Behind: Scaling Human-Centered Machine Translation"**. *NLLB Team.* arXiv 2022. [[Paper](https://arxiv.org/abs/2207.04672)] [[Checkpoint](https://github.com/facebookresearch/fairseq/tree/nllb)]
|
413 | 426 | 12. <u>BLOOM</u>: **"BLOOM: A 176B-Parameter Open-Access Multilingual Language Model"**. *BigScience Workshop*. arXiv 2022. [[Paper](https://arxiv.org/abs/2211.05100)] [[Checkpoint](https://huggingface.co/bigscience/bloom)]
|
414 | 427 | 13. <u>GLM</u>: **"GLM-130B: An Open Bilingual Pre-trained Model"**. *Aohan Zeng et al.* arXiv 2022. [[Paper](https://arxiv.org/abs/2210.02414)] [[Checkpoint](https://github.com/THUDM/GLM-130B)]
|
415 |
| -13. <u>Flan-T5</u>: **"Scaling Instruction-Finetuned Language Models"**. *Hyung Won Chung et al.* arXiv 2022. [[Paper](https://arxiv.org/abs/2210.11416)] [[Checkpoint](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints)] |
416 |
| -14. <u>mT0 && BLOOMZ</u>: **"Crosslingual Generalization through Multitask Finetuning"**. *Niklas Muennighoff et al.* arXiv 2022. [[Paper](https://arxiv.org/abs/2211.01786)] [[Checkpoint](https://github.com/bigscience-workshop/xmtf)] |
417 |
| -15. <u>Galactica</u>: **"Galactica: A Large Language Model for Science"**. *Ross Taylor et al.* arXiv 2022. [[Paper](https://arxiv.org/abs/2211.09085)] [[Checkpoint](https://huggingface.co/facebook/galactica-120b)] |
418 |
| -16. <u>OPT-IML</u>: **"OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization"**. *Srinivasan et al.* . arXiv 2022. [[Paper](https://arxiv.org/abs/2212.12017)] [[Checkpoint](https://huggingface.co/facebook/opt-iml-30b)] |
419 |
| -17. <u>Pythia</u>: **"Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling"**. *Stella Biderman et al.* . arXiv 2023. [[Paper](https://arxiv.org/abs/2304.01373)] [[Checkpoint](https://github.com/EleutherAI/pythia)] |
420 |
| -17. <u>LLaMA</u>: **"LLaMA: Open and Efficient Foundation Language Models"**. *Hugo Touvron et al.* arXiv 2023. [[Paper](https://arxiv.org/abs/2302.13971v1)] [[Checkpoint](https://github.com/facebookresearch/llama)] |
| 428 | +14. <u>Flan-T5</u>: **"Scaling Instruction-Finetuned Language Models"**. *Hyung Won Chung et al.* arXiv 2022. [[Paper](https://arxiv.org/abs/2210.11416)] [[Checkpoint](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints)] |
| 429 | +15. <u>mT0 && BLOOMZ</u>: **"Crosslingual Generalization through Multitask Finetuning"**. *Niklas Muennighoff et al.* arXiv 2022. [[Paper](https://arxiv.org/abs/2211.01786)] [[Checkpoint](https://github.com/bigscience-workshop/xmtf)] |
| 430 | +16. <u>Galactica</u>: **"Galactica: A Large Language Model for Science"**. *Ross Taylor et al.* arXiv 2022. [[Paper](https://arxiv.org/abs/2211.09085)] [[Checkpoint](https://huggingface.co/facebook/galactica-120b)] |
| 431 | +17. <u>OPT-IML</u>: **"OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization"**. *Srinivasan et al.* . arXiv 2022. [[Paper](https://arxiv.org/abs/2212.12017)] [[Checkpoint](https://huggingface.co/facebook/opt-iml-30b)] |
| 432 | +18. <u>CodeGeeX</u>: **"CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X"**. *Qinkai Zheng et al.* . arXiv 2023. [[Paper](https://arxiv.org/abs/2303.17568)] [[Checkpoint](https://github.com/THUDM/CodeGeeX)] |
| 433 | +19. <u>Pythia</u>: **"Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling"**. *Stella Biderman et al.* . arXiv 2023. [[Paper](https://arxiv.org/abs/2304.01373)] [[Checkpoint](https://github.com/EleutherAI/pythia)] |
| 434 | +20. <u>LLaMA</u>: **"LLaMA: Open and Efficient Foundation Language Models"**. *Hugo Touvron et al.* arXiv 2023. [[Paper](https://arxiv.org/abs/2302.13971v1)] [[Checkpoint](https://github.com/facebookresearch/llama)] |
421 | 435 |
|
422 | 436 | ### Closed-source Models
|
423 | 437 |
|
@@ -786,5 +800,7 @@ The authors would like to thank Yankai Lin and Yutao Zhu for proofreading this p
|
786 | 800 | | V1 | 2023/03/31 | The initial version. |
|
787 | 801 | | V2 | 2023/04/09 | Add the affiliation information.<br/>Revise Figure 1 and Table 1 and clarify the <br/>corresponding selection criterion for LLMs.<br/>Improve the writing.<br/>Correct some minor errors. |
|
788 | 802 | | V3 | 2023/04/11 | Correct the errors for library resources. |
|
789 |
| -| V4 | 2023/04/12 | Revise Figure 1 and Table 1, and clarify the release date of LLMs | |
| 803 | +| V4 | 2023/04/12 | Revise Figure 1 and Table 1 and clarify the release date of LLMs. | |
| 804 | +| V5 | 2023/04/16 | Add a new Section 2.2 about<br/>the technical evolution of GPT-series models. | |
| 805 | +| V6 | 2023/04/18 | Add some new models in<br/>Table 1 and Figure 1. | |
790 | 806 |
|
0 commit comments