-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
I brought over an issue from Hugging Face, and I have the same problem.
Great ideas in the paper. I have one question about the checkpoint: you start from qwen base, which has no reasoning or instruction tuning, right? Additionally it seems like you have no supervised step to illicit initial reasoning. I am wondering how the model learns how to use the tokens. What am I missing? Did you use any elaborate prompt templates? If so, how do they look like? I couldn't find any info on that in the paper
@ahatamiz Could you please provide some ideas?
Metadata
Metadata
Assignees
Labels
No labels