convert: Swap GLM4 EOS / EOT token #13505

henk717 · 2025-05-13T12:22:24Z

This PR is a simple reversal of the definitions, this allows finetunes to keep overriding the EOS should they need to while <|endoftext|> is treated like an EOT token.

In my test conversion this gives the following result:
INFO:gguf.vocab:Setting special token type eos to 151336
INFO:gguf.vocab:Setting special token type pad to 151329
INFO:gguf.vocab:Setting special token type eot to 151329
INFO:gguf.vocab:Setting special token type unk to 151329
INFO:gguf.vocab:Setting special token type bos to 151329

Before this PR the result is:
INFO:gguf.vocab:Setting special token type eos to 151336
INFO:gguf.vocab:Setting special token type pad to 151329
INFO:gguf.vocab:Setting special token type eot to 151336
INFO:gguf.vocab:Setting special token type unk to 151329
INFO:gguf.vocab:Setting special token type bos to 151329

If you prefer to solve this in a different manner (such as forcing the EOS to be set according to the internal converters definition) feel free to reject this PR and we can open an issue instead.

henk717 · 2025-05-20T16:18:20Z

@ngxson I see you were involved with previous template fixes, what do you think of this change? Right now all public GGUF's have the endoftext wrong so to clean up the ecosystem some fix is needed.

convert: Swap GLM4 EOS / EOT token

8c7020f

github-actions bot added the python python script changes label May 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

convert: Swap GLM4 EOS / EOT token #13505

convert: Swap GLM4 EOS / EOT token #13505

henk717 commented May 13, 2025

Uh oh!

henk717 commented May 20, 2025

Uh oh!

Uh oh!

convert: Swap GLM4 EOS / EOT token #13505

Are you sure you want to change the base?

convert: Swap GLM4 EOS / EOT token #13505

Conversation

henk717 commented May 13, 2025

Uh oh!

henk717 commented May 20, 2025

Uh oh!

Uh oh!