[Stop Sequences] support stop sequences #2712

zoooo0820 · 2025-07-04T08:17:24Z

This PR is to support stop sequences, which is specific tokens or phrases that signal the model to terminate the current generation process.

Related environment variables

FD_USE_STOP_SEQ: Whether to use stop sequences, default is 0
FD_STOP_SEQS_MAX_LEN : Maximum length of stop sequences, default is 8
FD_MAX_STOP_SEQS_NUM : Maximum number of stop sequences, default is 5

Usage

online serving

# launch serving

export FD_USE_STOP_SEQ=1   # set the environment variable to 1
python -m fastdeploy.entrypoints.openai.api_server \
    --model $YOUR_MODEL_PATH \
    --port 8233 \
    --engine-worker-queue-port 8234 \
    --metrics-port 8235 \
    --tensor-parallel-size 1 \
    --max-num-seqs 256 \
    --max-model-len 32768

# create a chat request with "stop" parameter
import openai
ip = "0.0.0.0"
service_http_port = "8233"
client = openai.Client(base_url=f"http://{ip}:{service_http_port}/v1", api_key="EMPTY_API_KEY")
response = client.chat.completions.create(
    model="default",
    messages=[
        {"role": "user", "content": '今天天气真好'},
    ],
    temperature=0.8,
    top_p=0.8,
    stream=False,
    stop=["天气"]
)

And response is below, terminating at the stop sequence:

ChatCompletion(id='chatcmpl-ea74c86a-0ee2-4fa6-bd75-bf2d199fdf2d', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='是呀，天气', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, reasoning_content=None))], created=1751616717, model='default', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=4, prompt_tokens=10, total_tokens=14, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetails(audio_tokens=None, cached_tokens=0)))

or give a stop_token_ids

import requests

ip = "0.0.0.0"
service_http_port = "8233"    # 服务配置的

api_url = f"http://{ip}:{service_http_port}/v1/chat/completions"
prompt = "今天天气真好"
payload = {
    "messages": [
        {"role": "user", "content": prompt}
    ],
    "temperature": 0.8,
    "top_p": 0,
    "stop_token_ids": [1525, 19770]
}

offline inference

run with export FD_USE_STOP_SEQ=1

from fastdeploy.engine.sampling_params import SamplingParams
from fastdeploy.entrypoints.llm import LLM

model_name_or_path = YOUR_MODEL_PATH

# give a "stop" parameter
sampling_params = SamplingParams(temperature=0.8, stop=["天气"])
llm = LLM(model=model_name_or_path, tensor_parallel_size=1)
output = llm.generate(prompts="今天天气真好",
                      use_tqdm=True,
                      sampling_params=sampling_params)

print(output)

[RequestOutput(request_id=d2b70ec7-b232-466e-85d3-e9b379c2a548, prompt='今天天气真好', prompt_token_ids=[3507, 13876, 24034], outputs=CompletionOutput(index=0, send_idx=0, text='，八十岁的奶奶带着三岁的小孙女去公园玩。奶奶牵着小孙女的手，看着周围的花花草草，脸上洋溢着幸福的笑容。\n\n请根据这段描述，写一个关于祖孙俩在公园的温馨故事。\n# 公园里的幸福时光\n天气', token_ids=[93956, 50890, 16844, 13307, 9645, 78778, 4224, 95834, 94344, 94148, 11830, 94642, 93977, 13307, 96602, 92784, 95834, 94344, 13704, 93956, 5556, 11138, 25991, 81489, 95107, 93956, 15082, 89960, 94142, 42396, 27058, 93977, 23, 23, 94515, 2563, 30770, 5434, 93956, 94667, 748, 5175, 95436, 95834, 96415, 94004, 11830, 93964, 30482, 4886, 93977, 23, 93993, 93919, 11830, 5705, 7849, 18272, 23, 13876, 2], draft_token_ids=[], reasoning_content=None, metrics=RequestMetrics(arrival_time=1751617613.0714405, inference_start_time=1751617613.0740652, first_token_time=0.08782720565795898, time_in_queue=0.0013637542724609375, preprocess_cost_time=0.0005915164947509766, model_forward_time=0.8258647918701172, model_execute_time=0.8284895420074463, request_start_time=None)

paddle-bot · 2025-07-04T08:17:32Z

Thanks for your contribution!

Jiang-Jia-Jun · 2025-07-06T02:31:39Z

FD_USE_STOP_SEQ 这个环境变量必需要存在吗，在vLLM中stop/stop_token_ids参数看起来是一个默认支持的功能，且没有限制长度。

support stop_reqs

bb88003

zoooo0820 force-pushed the support_stop_req branch from f40ba7f to bb88003 Compare July 4, 2025 08:46

support stop_token_ids

2781e51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Stop Sequences] support stop sequences #2712

[Stop Sequences] support stop sequences #2712

Uh oh!

zoooo0820 commented Jul 4, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Jul 4, 2025

Uh oh!

Jiang-Jia-Jun commented Jul 6, 2025

Uh oh!

Uh oh!

[Stop Sequences] support stop sequences #2712

Are you sure you want to change the base?

[Stop Sequences] support stop sequences #2712

Uh oh!

Conversation

zoooo0820 commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related environment variables

Usage

online serving

offline inference

Uh oh!

paddle-bot bot commented Jul 4, 2025

Uh oh!

Jiang-Jia-Jun commented Jul 6, 2025

Uh oh!

Uh oh!

zoooo0820 commented Jul 4, 2025 •

edited

Loading