Elicit Machine Learning Reading List

Purpose

The purpose of this curriculum is to help new Elicit employees learn background in machine learning, with a focus on language models. I’ve tried to strike a balance between papers that are relevant for deploying ML in production and techniques that matter for longer-term scalability.

If you don’t work at Elicit yet - we’re hiring ML and software engineers.

How to read

Fundamentals

Introduction to machine learning

Tier 1

Tier 2

Tier 3

Transformers

Tier 1

Tier 2

Tier 3

Tier 4+

Key foundation model architectures

Tier 1

Language Models are Unsupervised Multitask Learners (GPT-2)
Language Models are Few-Shot Learners (GPT-3)

Tier 2

✨ DeepSeek-R1 (DeepSeek-R1)
✨ DeepSeek-V3 Technical Report (DeepSeek-V3)
✨ The Llama 3 Herd of Models (Llama 3)
LLaMA: Open and Efficient Foundation Language Models (LLaMA)
Training language models to follow instructions with human feedback (OpenAI Instruct)

Tier 3

✨ LLaMA 2: Open Foundation and Fine-Tuned Chat Models (LLaMA 2)
✨ Qwen2.5 Technical Report (Qwen2.5)
✨ Titans: Learning to Memorize at Test Time
✨ Byte Latent Transformer
✨ Phi-4 Technical Report (phi-4)

Tier 4+

Evaluating Large Language Models Trained on Code (OpenAI Codex)
Mistral 7B (Mistral)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5)
Gemini: A Family of Highly Capable Multimodal Models (Gemini)
Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Mamba)
Scaling Instruction-Finetuned Language Models (Flan)
Efficiently Modeling Long Sequences with Structured State Spaces (video) (S4)
Consistency Models
Model Card and Evaluations for Claude Models (Claude 2)
OLMo: Accelerating the Science of Language Models
PaLM 2 Technical Report (Palm 2)
Textbooks Are All You Need II: phi-1.5 technical report (phi 1.5)
Visual Instruction Tuning (LLaVA)
A General Language Assistant as a Laboratory for Alignment
Finetuned Language Models Are Zero-Shot Learners (Google Instruct)
Galactica: A Large Language Model for Science
LaMDA: Language Models for Dialog Applications (Google Dialog)
OPT: Open Pre-trained Transformer Language Models (Meta GPT-3)
PaLM: Scaling Language Modeling with Pathways (PaLM)
Program Synthesis with Large Language Models (Google Codex)
Scaling Language Models: Methods, Analysis & Insights from Training Gopher (Gopher)
Solving Quantitative Reasoning Problems with Language Models (Minerva)
UL2: Unifying Language Learning Paradigms (UL2)

Training and finetuning

Tier 2

Tier 3

Tier 4+

Reasoning and runtime strategies

In-context reasoning

Tier 2

Tier 3

Tier 4+

Task decomposition

Tier 1

Tier 2

Tier 3

Tier 4+

Debate

Tier 2

AI safety via debate

Tier 3

Tier 4+

Tool use and scaffolding

Tier 2

Tier 3

Tier 4+

Honesty, factuality, and epistemics

Tier 2

Self-critiquing models for assisting human evaluators

Tier 3

Tier 4+

Applications

Science

Tier 2

Tier 3

Tier 4+

Forecasting

Tier 3

Tier 4+

Are Transformers Effective for Time Series Forecasting?

Search and ranking

Tier 2

Learning Dense Representations of Phrases at Scale
Text and Code Embeddings by Contrastive Pre-Training (OpenAI embeddings)

Tier 3

Tier 4+

ML in practice

Production deployment

Tier 1

Tier 2

Benchmarks

Tier 2

Tier 3

Tier 4+

Datasets

Tier 2

Tier 3

Advanced topics

World models and causality

Tier 3

Tier 4+

Planning

Tier 4+

Uncertainty, calibration, and active learning

Tier 2

Tier 3

Tier 4+

Interpretability and model editing

Tier 2

Tier 3

Tier 4+

Reinforcement learning

Tier 2

Tier 3

Tier 4+

The big picture

AI scaling

Tier 1

Tier 2

AI and compute
Scaling Laws for Transfer
Training Compute-Optimal Large Language Models (Chinchilla)

Tier 3

Tier 4+

AI safety

Tier 1

Tier 2

Tier 3

Tier 4+

Economic and social impacts

Tier 2

✨ AI 2027
✨ Situational Awareness (Aschenbrenner)

Tier 3

Tier 4+

Philosophy

Tier 2

Meaning without reference in large language models

Tier 4+

Maintainer

[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
README.md		README.md

elicit/machine-learning-list

Folders and files

Latest commit

History

Repository files navigation

Elicit Machine Learning Reading List

Purpose

How to read

Table of contents

Fundamentals

Introduction to machine learning

Transformers

Key foundation model architectures

Training and finetuning

Reasoning and runtime strategies

In-context reasoning

Task decomposition

Debate

Tool use and scaffolding

Honesty, factuality, and epistemics

Applications

Science

Forecasting

Search and ranking

ML in practice

Production deployment

Benchmarks

Datasets

Advanced topics

World models and causality

Planning

Uncertainty, calibration, and active learning

Interpretability and model editing

Reinforcement learning

The big picture

AI scaling

AI safety

Economic and social impacts

Philosophy

Maintainer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks