Skip to content

Conversation

henk717
Copy link

@henk717 henk717 commented May 13, 2025

When testing GLM4 models I noticed that they leak <|endoftext|> this happens because the EOS token is set to <|endoftext|> and the EOT token is set to <|user|>. The huggingface config.json defines the EOS as <|user|> which results in <|user|> being used for both and the <|endoftext|> token not being treated like an end of text token.

This PR is a simple reversal of the definitions, this allows finetunes to keep overriding the EOS should they need to while <|endoftext|> is treated like an EOT token.

In my test conversion this gives the following result:
INFO:gguf.vocab:Setting special token type eos to 151336
INFO:gguf.vocab:Setting special token type pad to 151329
INFO:gguf.vocab:Setting special token type eot to 151329
INFO:gguf.vocab:Setting special token type unk to 151329
INFO:gguf.vocab:Setting special token type bos to 151329

Before this PR the result is:
INFO:gguf.vocab:Setting special token type eos to 151336
INFO:gguf.vocab:Setting special token type pad to 151329
INFO:gguf.vocab:Setting special token type eot to 151336
INFO:gguf.vocab:Setting special token type unk to 151329
INFO:gguf.vocab:Setting special token type bos to 151329

If you prefer to solve this in a different manner (such as forcing the EOS to be set according to the internal converters definition) feel free to reject this PR and we can open an issue instead.

@github-actions github-actions bot added the python python script changes label May 13, 2025
@henk717
Copy link
Author

henk717 commented May 20, 2025

@ngxson I see you were involved with previous template fixes, what do you think of this change? Right now all public GGUF's have the endoftext wrong so to clean up the ecosystem some fix is needed.

@ThiloteE
Copy link
Contributor

Is this something that needs to be fixed in llama.cpp? Would it make sense to open a pull-request in gml4's repo and change the config or does that lead to regressions in other model use-cases?

@ngxson
Copy link
Collaborator

ngxson commented Aug 16, 2025

Honestly I received too many GLM fixes at some point and I'm doubt that we're having one PR that try to fix something, then another PR reverts the fix.

It's best to wait for the community to confirm if this works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants