You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* first draft of updated evaluation content
* first draft of copied content
* drop orpo and make exercise to conform to unit 1
* add to toc
* readthrough edits
* improve dpo page 1
* tweak 3. 1 more
* improve readability in DPO page 3
* remove learning objectives from dpo 3
* formatting in submission page
* clarity in unit 3 content on preference alignment and DPO
* Update DPO section in unit 3 to enhance mathematical clarity and improve formatting of hyperparameters and next steps.
* Update DPO setup instructions in unit 3 with package version upgrades, correct model references, and enhance clarity in code examples.
* Update model reference in DPO setup instructions in unit 3 to correct base model name.
* update readme with release changes
* [UNIT 2] re-release evaluation unit for 2025 (#231)
* first draft of updated evaluation content
* update vLLM with styling and links
* Update evaluation content in unit 2.1: Corrected capitalization of "model Hub" and updated alternative evaluation tool links for consistency.
* Refactor output formatting in unit 2: Updated code block styles to include language annotations and removed unnecessary HTML tags for improved readability.
* Fix capitalization of "vLLM"
* Enhance unit 2 content: Added hyperlinks to MMLU, TruthfulQA, BBH, WinoGrande
* Update link for Hugging Face Jobs documentation in unit 2
* Change example dataset from SmolTalk2 (#250)
Co-authored-by: burtenshaw <[email protected]>
* Small nits updated (#251)
Co-authored-by: burtenshaw <[email protected]>
* Unit 4 rerelease (#252)
* restore languages from main
* restore changes in notebooks
* restore v1 changes
* restore github changes
* restore unit notebook
* restore unit 1 improvements from main
* restore more improvements
* torch_dtype to dtype (#253)
* fix all torch_dtype refs
---------
Co-authored-by: Carlos Miguel Patiño <[email protected]>
Co-authored-by: Sergio Paniego Blanco <[email protected]>
@@ -1203,7 +1206,7 @@ We instantiate the trainer, capture a pre-training baseline generation, launch `
1203
1206
1204
1207
trainer = SFTTrainer(
1205
1208
model=model,
1206
-
train_dataset=dataset["train"],
1209
+
train_dataset=formatted_dataset,
1207
1210
args=config,
1208
1211
)
1209
1212
```
@@ -1249,22 +1252,23 @@ In the previous exercises we've dived deep into using TRL's Python API for fine-
1249
1252
1250
1253
We can define a command in TRL CLI to fine-tune a model. We'll be able to run it with `trl sft` command. The CLI command and Python API share the same configuration options.
1251
1254
1255
+
We preprocessed the `smoltalk_everyday_convs_reasoning_Qwen3_32B_think` subset of SmolTalk2 so that is easier to work with it when using the TRL CLI.
0 commit comments