Skip to content

Commit 2cc6721

Browse files
burtenshawcmpatinosergiopaniego
authored
[RELEASE] release 2 of the smol course 25 (#249)
* first draft of updated evaluation content * first draft of copied content * drop orpo and make exercise to conform to unit 1 * add to toc * readthrough edits * improve dpo page 1 * tweak 3. 1 more * improve readability in DPO page 3 * remove learning objectives from dpo 3 * formatting in submission page * clarity in unit 3 content on preference alignment and DPO * Update DPO section in unit 3 to enhance mathematical clarity and improve formatting of hyperparameters and next steps. * Update DPO setup instructions in unit 3 with package version upgrades, correct model references, and enhance clarity in code examples. * Update model reference in DPO setup instructions in unit 3 to correct base model name. * update readme with release changes * [UNIT 2] re-release evaluation unit for 2025 (#231) * first draft of updated evaluation content * update vLLM with styling and links * Update evaluation content in unit 2.1: Corrected capitalization of "model Hub" and updated alternative evaluation tool links for consistency. * Refactor output formatting in unit 2: Updated code block styles to include language annotations and removed unnecessary HTML tags for improved readability. * Fix capitalization of "vLLM" * Enhance unit 2 content: Added hyperlinks to MMLU, TruthfulQA, BBH, WinoGrande * Update link for Hugging Face Jobs documentation in unit 2 * Change example dataset from SmolTalk2 (#250) Co-authored-by: burtenshaw <[email protected]> * Small nits updated (#251) Co-authored-by: burtenshaw <[email protected]> * Unit 4 rerelease (#252) * restore languages from main * restore changes in notebooks * restore v1 changes * restore github changes * restore unit notebook * restore unit 1 improvements from main * restore more improvements * torch_dtype to dtype (#253) * fix all torch_dtype refs --------- Co-authored-by: Carlos Miguel Patiño <[email protected]> Co-authored-by: Sergio Paniego Blanco <[email protected]>
1 parent e3f986f commit 2cc6721

File tree

25 files changed

+3253
-36
lines changed

25 files changed

+3253
-36
lines changed

notebooks/1/4.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@
125125
},
126126
{
127127
"cell_type": "code",
128-
"execution_count": 5,
128+
"execution_count": null,
129129
"metadata": {
130130
"colab": {
131131
"base_uri": "https://localhost:8080/",
@@ -681,11 +681,11 @@
681681
"\n",
682682
"# Load models (use smaller precision for memory efficiency)\n",
683683
"base_model = AutoModelForCausalLM.from_pretrained(\n",
684-
" base_model_name, torch_dtype=torch.float16, device_map=\"auto\"\n",
684+
" base_model_name, dtype=torch.float16, device_map=\"auto\"\n",
685685
")\n",
686686
"\n",
687687
"instruct_model = AutoModelForCausalLM.from_pretrained(\n",
688-
" instruct_model_name, torch_dtype=torch.float16, device_map=\"auto\"\n",
688+
" instruct_model_name, dtype=torch.float16, device_map=\"auto\"\n",
689689
")\n",
690690
"\n",
691691
"print(\"Models loaded successfully!\")\n"
@@ -1944,7 +1944,7 @@
19441944
},
19451945
{
19461946
"cell_type": "code",
1947-
"execution_count": 36,
1947+
"execution_count": null,
19481948
"metadata": {
19491949
"colab": {
19501950
"base_uri": "https://localhost:8080/",
@@ -2010,7 +2010,7 @@
20102010
"print(f\"Loading {model_name}...\")\n",
20112011
"model = AutoModelForCausalLM.from_pretrained(\n",
20122012
" model_name,\n",
2013-
" torch_dtype=torch.float16, # Use float16 for memory efficiency\n",
2013+
" dtype=torch.float16, # Use float16 for memory efficiency\n",
20142014
" device_map=\"auto\",\n",
20152015
" trust_remote_code=True,\n",
20162016
")\n",

notebooks/4/4.ipynb

Lines changed: 624 additions & 0 deletions
Large diffs are not rendered by default.

pull_request_template.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# December 2024 Student Submission
2+
3+
## Module Completed
4+
- [ ] Module 1: Instruction Tuning
5+
- [ ] Module 2: Preference Alignment
6+
- [ ] Module 3: Parameter-efficient Fine-tuning
7+
- [ ] Module 4: Evaluation
8+
- [ ] Module 5: Vision-language Models
9+
- [ ] Module 6: Synthetic Datasets
10+
- [ ] Module 7: Inference
11+
- [ ] Module 8: Deployment
12+
13+
## Changes Made
14+
Describe what you've done in this PR:
15+
1. What concepts did you learn?
16+
2. What changes or additions did you make?
17+
3. Any challenges you faced?
18+
19+
## Notebooks Added/Modified
20+
List any notebooks you've added or modified:
21+
- [ ] Added new example in `module_name/student_examples/my_example.ipynb`
22+
- [ ] Modified existing notebook with additional examples
23+
- [ ] Added documentation or comments
24+
25+
## Checklist
26+
27+
- [ ] I have read the module materials
28+
- [ ] My code runs without errors
29+
- [ ] I have pushed models and datasets to the huggingface hub
30+
- [ ] My PR is based on the `december-2024` branch
31+
32+
## Questions or Discussion Points
33+
Add any questions you have or points you'd like to discuss:
34+
1.
35+
2.
36+
37+
## Additional Notes
38+
Any other information that might be helpful for reviewers:
39+

units/en/_toctree.yml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,4 +20,37 @@
2020
- local: unit1/6
2121
title: Submit your final project!
2222

23+
- title: "2. Model Evaluation"
24+
sections:
25+
- local: unit2/1
26+
title: Introduction to Model Evaluation
27+
- local: unit2/2
28+
title: vLLM Inference with Hugging Face Models
29+
- local: unit2/3
30+
title: Automatic Benchmarks
31+
- local: unit2/4
32+
title: Custom Domain Evaluation
33+
- local: unit2/5
34+
title: Submit your evaluation results!
35+
36+
- title: "3. Preference Alignment"
37+
sections:
38+
- local: unit3/1
39+
title: Introduction to Preference Alignment
40+
- local: unit3/2
41+
title: Direct Preference Optimization (DPO)
42+
- local: unit3/3
43+
title: Advanced DPO Techniques
44+
- local: unit3/4
45+
title: DPO Hands-on Implementation
2346

47+
- title: "4. Vision Language Models"
48+
sections:
49+
- local: unit4/1
50+
title: Introduction to Vision Language Models
51+
- local: unit4/2
52+
title: Using Pretrained VLMs
53+
- local: unit4/3
54+
title: Fine-Tuning VLMs
55+
- local: unit4/4
56+
title: Hands-On Fine-Tuning VLMs

units/en/unit0/1.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ This course is smol but fast! It's for software developers and engineers looking
2626

2727
In this course, you will:
2828

29-
📖 Study instruction tuning, supervised fine-tuning, preference alignment, evaluation, vision language models… and more!
29+
* 📖 Study instruction tuning, supervised fine-tuning, and preference alignment in theory and practice.
3030
* 🧑‍💻 Learn to use established fine-tuning frameworks and tools like TRL and Transformers.
3131
* 💾 Share your projects and explore fine-tuning applications created by the community.
3232
* 🏆 Participate in challenges where you will evaluate your fine-tuned models against other students.

units/en/unit1/4.md

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -526,7 +526,7 @@ If we dive into the out put below, we can see that the instruct model's hybrid r
526526
<details>
527527
<summary>Output</summary>
528528

529-
````python output
529+
```python output
530530

531531
=== TESTING REASONING CAPABILITIES ===
532532

@@ -710,7 +710,7 @@ Starting with the dollars: 18 dollars plus 12 dollars is 30 dollars. Then the ce
710710
711711
--------------------------------------------------
712712
713-
````
713+
```
714714

715715
</details>
716716

@@ -1067,7 +1067,7 @@ print("=== PREPARING DATASET ===\n")
10671067

10681068
# Option 1: Use SmolTalk2 (recommended for beginners)
10691069
dataset = load_dataset("HuggingFaceTB/smoltalk2", "SFT")
1070-
train_dataset = dataset["train"].select(range(1000)) # Use subset for faster training
1070+
train_dataset = dataset["smoltalk_everyday_convs_reasoning_Qwen3_32B_think"].select(range(1000)) # Use subset for faster training
10711071

10721072
# Option 2: Use your own processed dataset from Exercise 2
10731073
# train_dataset = gsm8k_formatted.select(range(500))
@@ -1089,7 +1089,7 @@ def format_chat_template(example):
10891089
]
10901090

10911091
# Apply chat template
1092-
text = tokenizer.apply_chat_template(
1092+
text = instruct_tokenizer.apply_chat_template(
10931093
messages,
10941094
tokenize=False,
10951095
add_generation_prompt=False
@@ -1098,6 +1098,9 @@ def format_chat_template(example):
10981098

10991099
# Apply formatting
11001100
formatted_dataset = train_dataset.map(format_chat_template)
1101+
formatted_dataset = formatted_dataset.remove_columns(
1102+
[col for col in formatted_dataset.column_names if col != "text"]
1103+
)
11011104
print(f"Formatted example: {formatted_dataset[0]['text'][:200]}...")
11021105
```
11031106

@@ -1203,7 +1206,7 @@ We instantiate the trainer, capture a pre-training baseline generation, launch `
12031206

12041207
trainer = SFTTrainer(
12051208
model=model,
1206-
train_dataset=dataset["train"],
1209+
train_dataset=formatted_dataset,
12071210
args=config,
12081211
)
12091212
```
@@ -1249,22 +1252,23 @@ In the previous exercises we've dived deep into using TRL's Python API for fine-
12491252

12501253
We can define a command in TRL CLI to fine-tune a model. We'll be able to run it with `trl sft` command. The CLI command and Python API share the same configuration options.
12511254

1255+
We preprocessed the `smoltalk_everyday_convs_reasoning_Qwen3_32B_think` subset of SmolTalk2 so that is easier to work with it when using the TRL CLI.
1256+
12521257
```bash
12531258
# Fine-tune SmolLM3 using TRL CLI
12541259
trl sft \
12551260
--model_name_or_path HuggingFaceTB/SmolLM3-3B-Base \
1256-
--dataset_name HuggingFaceTB/smoltalk2 \
1257-
--dataset_config SFT \
1261+
--dataset_name HuggingFaceTB/smoltalk2_everyday_convs_think \
12581262
--output_dir ./smollm3-sft-cli \
12591263
--per_device_train_batch_size 4 \
12601264
--gradient_accumulation_steps 2 \
12611265
--learning_rate 5e-5 \
12621266
--num_train_epochs 1 \
1263-
--max_seq_length 2048 \
1267+
--max_length 2048 \
12641268
--logging_steps 10 \
12651269
--save_steps 500 \
12661270
--warmup_steps 100 \
1267-
--bf16 \
1271+
--bf16 True \
12681272
--push_to_hub \
12691273
--hub_model_id your-username/smollm3-sft-cli
12701274
```
@@ -1275,16 +1279,15 @@ For convenience and reproducibility, we can also create a configuration file to
12751279
```yaml
12761280
# Model and dataset
12771281
model_name_or_path: HuggingFaceTB/SmolLM3-3B-Base
1278-
dataset_name: HuggingFaceTB/smoltalk2
1279-
dataset_config: SFT
1282+
dataset_name: HuggingFaceTB/smoltalk2_everyday_convs_think
12801283
output_dir: ./smollm3-advanced-sft
12811284

12821285
# Training hyperparameters
12831286
per_device_train_batch_size: 2
12841287
gradient_accumulation_steps: 4
12851288
learning_rate: 3e-5
12861289
num_train_epochs: 2
1287-
max_seq_length: 4096
1290+
max_length: 4096
12881291

12891292
# Optimization
12901293
warmup_steps: 200
@@ -1302,7 +1305,7 @@ remove_unused_columns: false
13021305
logging_steps: 25
13031306
eval_steps: 250
13041307
save_steps: 500
1305-
evaluation_strategy: steps
1308+
eval_strategy: steps
13061309
load_best_model_at_end: true
13071310
metric_for_best_model: eval_loss
13081311

@@ -1324,7 +1327,7 @@ trl sft --config sft_config.yaml
13241327

13251328
**If you get GPU out of memory errors:**
13261329
- Reduce `per_device_train_batch_size` to 1
1327-
- Reduce `max_seq_length` to 1024 or 512
1330+
- Reduce `max_length` to 1024 or 512
13281331
- Use `torch.cuda.empty_cache()` to clear GPU memory
13291332

13301333
**If models fail to load:**

units/en/unit1/5.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ config = SFTConfig(
6464
# Train
6565
trainer = SFTTrainer(
6666
model=model,
67-
train_dataset=dataset["train"],
67+
train_dataset=dataset["smoltalk_everyday_convs_reasoning_Qwen3_32B_think"],
6868
args=config,
6969
)
7070
trainer.train()
@@ -116,8 +116,7 @@ hf jobs uv run \
116116
--secrets HF_TOKEN \
117117
"https://raw.githubusercontent.com/huggingface/trl/main/trl/scripts/sft.py" \
118118
--model_name_or_path HuggingFaceTB/SmolLM3-3B-Base \
119-
--dataset_name HuggingFaceTB/smoltalk2 \
120-
--dataset_config SFT \
119+
--dataset_name HuggingFaceTB/smoltalk2_everyday_convs_think \
121120
--learning_rate 5e-5 \
122121
--per_device_train_batch_size 4 \
123122
--max_steps 1000 \
@@ -173,8 +172,7 @@ hf jobs uv run \
173172
--secrets HF_TOKEN \
174173
"https://raw.githubusercontent.com/huggingface/trl/main/trl/scripts/sft.py" \
175174
--model_name_or_path HuggingFaceTB/SmolLM3-3B-Base \
176-
--dataset_name HuggingFaceTB/smoltalk2 \
177-
--dataset_config SFT \
175+
--dataset_name HuggingFaceTB/smoltalk2_everyday_convs_think \
178176
--output_dir smollm3-lora-sft-jobs \
179177
--per_device_train_batch_size 4 \
180178
--learning_rate 5e-5 \
@@ -221,7 +219,7 @@ Training typically takes 30-90 minutes for 1000 steps depending on hardware and
221219
**Out of Memory Errors**:
222220
- Reduce `per_device_train_batch_size`
223221
- Enable gradient checkpointing
224-
- Use smaller `max_seq_length`
222+
- Use smaller `max_length`
225223

226224
**Timeout Issues**:
227225
- Increase timeout parameter

0 commit comments

Comments
 (0)