Skip to content

Eval bug: Qwen3 30B adds spaces to end of each line #13508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Nepherpitou opened this issue May 13, 2025 · 2 comments
Closed

Eval bug: Qwen3 30B adds spaces to end of each line #13508

Nepherpitou opened this issue May 13, 2025 · 2 comments

Comments

@Nepherpitou
Copy link

Name and Version

.\llamacpp\cuda12\llama-server.exe --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 3 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 2: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from C:\Users\unat\llm\llamacpp\cuda12\ggml-cuda.dll
load_backend: loaded RPC backend from C:\Users\unat\llm\llamacpp\cuda12\ggml-rpc.dll
load_backend: loaded CPU backend from C:\Users\unat\llm\llamacpp\cuda12\ggml-cpu-icelake.dll
version: 5338 (43dfd74)
built with MSVC 19.29.30159.0 for Windows AMD64

Operating systems

Windows

GGML backends

Vulkan

Hardware

Ryzen 9 7900X
Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 2: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes

Models

Qwen3 30B Q6_K_XL

Problem description & steps to reproduce

Start server command

./llamacpp/vulkan/llama-server.exe --jinja --reasoning-format deepseek --no-mmap --no-warmup --host 0.0.0.0 --port 5102 --metrics --slots -m ./models/Qwen3-30B-A3B-128K-UD-Q6_K_XL.gguf -ngl 99 --flash-attn --ctx-size 65536 -ctk q8_0 -ctv q8_0 --min-p 0 --top-k 20 --no-context-shift -dev VULKAN1,VULKAN2 -ts 100,100 -t 12 --log-colors

Post request

POST http://localhost:5102/v1/chat/completions
Content-Type: application/json

{
  "model": "qwen3-30b",
  "messages": [
    {
      "content": "Write twenty words. Each from new line.",
      "role": "user"
    }
  ],
  "stream_options": {
    "include_usage": true
  },
  "stream": false
}

Response

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>\nOkay, the user wants me to write twenty words, each on a new line. Let me start by thinking of different categories to make sure the words are varied. Maybe start with some common nouns, then verbs, adjectives, maybe a few adverbs or other parts of speech.\n\nFirst, \"apple\" is a simple noun. Then \"run\" as a verb. \"Happy\" as an adjective. \"Quickly\" as an adverb. \"Tree\" is another noun. \"Jump\" as a verb. \"Beautiful\" for adjective. \"Silently\" as adverb. \"Ocean\" noun. \"Sing\" verb. \"Brave\" adjective. \"Suddenly\" adverb. \"Mountain\" noun. \"Laugh\" verb. \"Dark\" adjective. \"Gently\" adverb. \"Light\" noun or adjective. \"Write\" verb. \"Strong\" adjective. \"Forever\" adverb or noun.\n\nWait, I need to check if each word is from a new line. Let me list them out one by one. Make sure there are exactly twenty. Also, avoid repeating the same parts of speech too much. Maybe mix them up. Let me count: apple, run, happy, quickly, tree, jump, beautiful, silently, ocean, sing, brave, suddenly, mountain, laugh, dark, gently, light, write, strong, forever. That's twenty. Each is on a new line. Should be okay. I'll present them as requested.\n</think>\n\napple  \nrun  \nhappy  \nquickly  \ntree  \njump  \nbeautiful  \nsilently  \nocean  \nsing  \nbrave  \nsuddenly  \nmountain  \nlaugh  \ndark  \ngently  \nlight  \nwrite  \nstrong  \nforever"
      }
    }
  ],
  "created": 1747143882,
  "model": "qwen3-30b",
  "system_fingerprint": "b5338-43dfd741",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 355,
    "prompt_tokens": 17,
    "total_tokens": 372
  },
  "id": "chatcmpl-BaFCh4keAPfFfbfDGrkLl20HMI4apMeO",
  "timings": {
    "prompt_n": 1,
    "prompt_ms": 146.114,
    "prompt_per_token_ms": 146.114,
    "prompt_per_second": 6.843971145817649,
    "predicted_n": 355,
    "predicted_ms": 5332.55,
    "predicted_per_token_ms": 15.021267605633803,
    "predicted_per_second": 66.57227780330236
  }
}

What's wrong

After there is response content: \napple \nrun \nhappy \nquickly \ntree \njump \nbeautiful \nsilently \nocean \nsing \nbrave \nsuddenly \nmountain \nlaugh \ndark \ngently \nlight \nwrite \nstrong \nforever. And there are 2 spaces before each newline character. Everytime, everywhere.

First Bad Commit

No response

Relevant log output

srv  params_from_: Chat format: Content-only
slot launch_slot_: id  0 | task 22047 | processing task
slot update_slots: id  0 | task 22047 | new prompt, n_ctx_slot = 65536, n_keep = 0, n_prompt_tokens = 17
slot update_slots: id  0 | task 22047 | need to evaluate at least 1 token to generate logits, n_past = 17, n_prompt_tokens = 17
slot update_slots: id  0 | task 22047 | kv cache rm [16, end)
slot update_slots: id  0 | task 22047 | prompt processing progress, n_past = 17, n_tokens = 1, progress = 0.058824
slot update_slots: id  0 | task 22047 | prompt done, n_past = 17, n_tokens = 1
slot      release: id  0 | task 22047 | stop processing: n_past = 454, truncated = 0
slot print_timing: id  0 | task 22047 |
prompt eval time =     145.62 ms /     1 tokens (  145.62 ms per token,     6.87 tokens per second)
       eval time =    6362.02 ms /   438 tokens (   14.53 ms per token,    68.85 tokens per second)
      total time =    6507.64 ms /   439 tokens
srv  update_slots: all slots are idle
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
@pwilkin
Copy link
Contributor

pwilkin commented May 13, 2025

It's a "feature" of the template, I think.

@cmdntfnd
Copy link

cmdntfnd commented May 13, 2025

A lot of models these days focus on outputting nicely formatted markdown. Markdown requires two spaces before \n to do a proper linebreak as explained here:
https://daringfireball.net/projects/markdown/syntax#p

When you do want to insert a
break tag using Markdown, you end a line with two or more spaces, then type return.

Not a bug.

@CISC CISC closed this as completed May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants