sampling: make top_n_sigma no-op at <=0 rather than <0 #13345

DocShotgun · 2025-05-06T17:18:48Z

This changes the behavior of the recently-added top_n_sigma sampler to a short-circuit no-op state at values <= 0 rather than < 0. The rationale for this change is as follows:

The current behavior of top_n_sigma == 0 is redundant as it is a more roundabout way to achieve greedy decoding, which already has other means of being specified, i.e. top_k == 1
top_n_sigma == 0 represents no-op rather than greedy decoding in other existing tooling (i.e. text-generation-webui, aphrodite-engine, koboldcpp, YALS), so this would keep the interface consistent for frontend developers

CISC · 2025-05-06T17:34:54Z

~~Do you know the rationale for koboldcpp also checking for cur_p->size <= 1?~~

Well, looking closer I see why, so perhaps add that too?

src/llama-sampling.cpp

* avoid running nsigma when only a single candidate remains Co-authored-by: Sigbjørn Skjæret <[email protected]>

CISC · 2025-05-06T18:10:42Z

Ouch, test-sampling fails, can you look into why?

DocShotgun · 2025-05-06T18:42:45Z

Ouch, test-sampling fails, can you look into why?

I took a look at it, and far as I can tell, it's because this line:

llama.cpp/tests/test-sampling.cpp

Line 363 in 91a86a6

test_top_n_sigma({0.1f, 0.2f, 0.3f, 0.4f}, {1.0f, 0.0f, 0.0f, 0.0f}, 0.00f);

explicitly checks for top_n_sigma == 0 leading to greedy decoding, and that behavior is changed by this PR.

If I change the test to check for no-op instead, it passes:

test_top_n_sigma({0.1f, 0.2f, 0.3f, 0.4f}, {0.4f, 0.3f, 0.2f, 0.1f}, 0.00f);

* adjust the sampling test to reflect top_n_sigma == 0 behaving as no-op rather than greedy decoding

CISC · 2025-05-06T19:14:22Z

If I change the test to check for no-op instead, it passes

Great, can you also add a comment explaining why it was changed?

sampling: make nsigma == 0 a no-op

6d6877d

DocShotgun mentioned this pull request May 6, 2025

Updates/fixes for llama.cpp textgen settings SillyTavern/SillyTavern#3961

Merged

1 task

CISC requested changes May 6, 2025

View reviewed changes

src/llama-sampling.cpp Outdated Show resolved Hide resolved

sampling: short-circuit nsigma when cur_p <= 1

62e51ab

* avoid running nsigma when only a single candidate remains Co-authored-by: Sigbjørn Skjæret <[email protected]>

CISC approved these changes May 6, 2025

View reviewed changes

sampling: fix top_n_sigma == 0 test to reflect new behavior

699256a

* adjust the sampling test to reflect top_n_sigma == 0 behaving as no-op rather than greedy decoding

github-actions bot added the testing Everything test related label May 6, 2025

sampling: additional context to top_n_sigma test change

91644d8

CISC merged commit ffc7272 into ggml-org:master May 6, 2025
45 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sampling: make top_n_sigma no-op at <=0 rather than <0 #13345

sampling: make top_n_sigma no-op at <=0 rather than <0 #13345

Uh oh!

DocShotgun commented May 6, 2025

Uh oh!

CISC commented May 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

CISC commented May 6, 2025

Uh oh!

DocShotgun commented May 6, 2025

Uh oh!

CISC commented May 6, 2025

Uh oh!

Uh oh!

Uh oh!

sampling: make top_n_sigma no-op at <=0 rather than <0 #13345

sampling: make top_n_sigma no-op at <=0 rather than <0 #13345

Uh oh!

Conversation

DocShotgun commented May 6, 2025

Uh oh!

CISC commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

CISC commented May 6, 2025

Uh oh!

DocShotgun commented May 6, 2025

Uh oh!

CISC commented May 6, 2025

Uh oh!

Uh oh!

Uh oh!

CISC commented May 6, 2025 •

edited

Loading