-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
feat: Add <think> at the beginning #1629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: alpha
Are you sure you want to change the base?
Conversation
WalkthroughAdds a new addThinkFirst/add_think_first setting across DTO, backend (OpenAI relay streaming and non-streaming paths), and web UI to optionally prepend "\n" at the start of responses (first chunk for streaming). Wires the flag through Gemini→OpenAI stream formatter. Updates docs and i18n (en, zh-CN). Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant API as Relay API
participant OpenAI as Upstream (OpenAI-compatible)
rect rgba(230,245,255,0.6)
note over API: Streaming flow with addThinkFirst=true
Client->>API: POST /chat (stream=true)
API->>OpenAI: Forward request
OpenAI-->>API: Stream chunks (delta)
API->>API: firstChunk=true? addThink = firstChunk && AddThinkFirst
API->>Client: First chunk with "<think>\n" + delta (if addThink)
loop Remaining chunks
OpenAI-->>API: Next chunk
API->>Client: Pass-through (no additional "<think>")
end
API-->>Client: [DONE]
end
sequenceDiagram
autonumber
participant Client
participant API as Relay API
participant OpenAI as Upstream (OpenAI-compatible)
rect rgba(240,255,240,0.6)
note over API: Non-streaming flow with addThinkFirst=true
Client->>API: POST /chat (stream=false)
API->>OpenAI: Forward request
OpenAI-->>API: JSON response
API->>API: If AddThinkFirst, prefix "<think>\n" to message content
API-->>Client: Modified JSON response
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (10)
dto/channel_settings.go (1)
10-10
: AddThinkFirst wiring looks good; add a brief doc comment and confirm feature precedenceThe field/tag are correct and backward compatible (
omitempty
). For maintainability, document that this applies only to OpenAI channel types and what “first” means in streaming vs. non-streaming. Also confirm the precedence with existing ThinkingToContent/ForceFormat so we don't double-prepend when content already starts with "".Suggested inline doc comment:
type ChannelSettings struct { ForceFormat bool `json:"force_format,omitempty"` ThinkingToContent bool `json:"thinking_to_content,omitempty"` Proxy string `json:"proxy"` PassThroughBodyEnabled bool `json:"pass_through_body_enabled,omitempty"` SystemPrompt string `json:"system_prompt,omitempty"` SystemPromptOverride bool `json:"system_prompt_override,omitempty"` - AddThinkFirst bool `json:"add_think_first,omitempty"` + // AddThinkFirst: prepend "<think>\n" to the first assistant output + // (streaming: first chunk; non-streaming: start of content). + // Only applies to OpenAI channel types. + AddThinkFirst bool `json:"add_think_first,omitempty"` }Verification checklist:
- When both ThinkingToContent and AddThinkFirst are enabled, ensure only a single leading "\n" is emitted and not duplicated if model already starts with "".
- Confirm UI saves/loads
add_think_first
into this struct via settings JSON.web/src/i18n/locales/en.json (1)
1350-1351
: Tweak phrasing for clearer EnglishMinor copy nit to read more naturally and consistently with other labels.
Apply this diff:
- "开头补充<think>": "Add <think> at the beginning", - "将<think>\\n拼接到响应的开头(只适用于OpenAI渠道类型)": "Add <think>\\n at the beginning of the response (Only for OpenAI channel types)", + "开头补充<think>": "Prepend <think> at the beginning", + "将<think>\\n拼接到响应的开头(只适用于OpenAI渠道类型)": "Prepend <think>\\n to responses (OpenAI channels only)",relay/channel/openai/helper.go (1)
21-21
: Add a short doc comment describing addThink semanticsThe signature change is fine; add a one-liner so future readers know when and how the flag is applied.
Apply this diff:
-// 辅助函数 +// 辅助函数 +// HandleStreamFormat forwards an OpenAI-like stream chunk to the target format. +// addThink: when true, prepend "<think>\n" to the first assistant chunk (OpenAI format only). func HandleStreamFormat(c *gin.Context, info *relaycommon.RelayInfo, data string, forceFormat bool, thinkToContent bool, addThink bool) error {docs/channel/other_setting.md (2)
9-11
: Clarify scope and behavior of “思考内容转换”.Document that this conversion currently applies to streaming responses (OpenAI relay path) and that the relay will emit a closing tag when transitioning from reasoning to content. This avoids confusion for non-streaming responses where reasoning-to-content conversion may not occur.
Suggested patch:
-2. 思考内容转换 - - 用于标识是否将思考内容`reasoning_content`转换为`<think>`标签拼接到内容中返回 +2. 思考内容转换 + - 将思考内容 `reasoning_content` 转换为 `<think>` 标签拼接到内容中(流式响应生效),并在切换到正常内容时自动补充 `</think>` 结束标签。 + - 说明:当前主要在 OpenAI 流式转发路径生效;非流式响应不一定进行 reasoning 到 content 的转换。
25-27
: State “add_think_first” applicability and idempotency.The code inserts the prefix in OpenAI relay paths. Please make this explicit in the doc and note that the relay will avoid duplicating the tag if the upstream already starts with “”.
Suggested patch:
-7. add_think_first - - 在通过vllm和sglang自部署模型使用一些自带`<think>\n`的`chat_template`时,将`<think>\n`标签拼接到响应的开头 +7. add_think_first + - 适用范围:仅 OpenAI 渠道(包括流式与非流式)。 + - 场景:在通过 vLLM / sglang 自部署且使用包含 `"<think>\n"` 的 chat_template 时,保证响应以 `"<think>\n"` 开头,便于下游识别思考内容。 + - 去重:若上游内容已以 `<think>` 开头,转发层不会重复添加。web/src/components/table/channels/modals/EditChannelModal.jsx (2)
176-192
: Avoid state staleness when building the setting JSON.Because setState is async, deriving settingsJson from the possibly stale channelSettings can drop concurrent toggles. Build and persist the new object from the same “prev” snapshot to keep it atomic.
-const handleChannelSettingsChange = (key, value) => { - // 更新内部状态 - setChannelSettings(prev => ({ ...prev, [key]: value })); - - // 同步更新到表单字段 - if (formApiRef.current) { - formApiRef.current.setValue(key, value); - } - - // 同步更新inputs状态 - setInputs(prev => ({ ...prev, [key]: value })); - - // 生成setting JSON并更新 - const newSettings = { ...channelSettings, [key]: value }; - const settingsJson = JSON.stringify(newSettings); - handleInputChange('setting', settingsJson); -}; +const handleChannelSettingsChange = (key, value) => { + setChannelSettings(prev => { + const next = { ...prev, [key]: value }; + // 同步更新到表单字段 + formApiRef.current?.setValue(key, value); + // 同步更新inputs状态 + setInputs(p => ({ ...p, [key]: value })); + // 生成 setting JSON 并更新到 inputs.setting + handleInputChange('setting', JSON.stringify(next)); + return next; + }); +};
1800-1807
: Gate the switch to OpenAI channels to match actual applicability.UI currently shows the toggle for all channel types but the behavior is implemented in the OpenAI relay path. To avoid confusion, render this switch only when inputs.type === 1 (OpenAI), mirroring “强制格式化”.
- <Form.Switch - field='add_think_first' - label={t('开头补充<think>')} - checkedText={t('开')} - uncheckedText={t('关')} - onChange={(value) => handleChannelSettingsChange('add_think_first', value)} - extraText={t('将<think>\\n拼接到响应的开头(只适用于OpenAI渠道类型)')} - /> + {inputs.type === 1 && ( + <Form.Switch + field='add_think_first' + label={t('开头补充<think>')} + checkedText={t('开')} + uncheckedText={t('关')} + onChange={(value) => handleChannelSettingsChange('add_think_first', value)} + extraText={t('将<think>\\n拼接到响应的开头(只适用于OpenAI渠道类型)')} + /> + )}relay/channel/openai/relay-openai.go (3)
43-52
: Make prefixing idempotent to avoid duplicate .If upstream content already begins with “”, prefixing again will duplicate the tag (especially in non-streaming paths or custom chat_templates). Guard with a prefix check.
- if addThink { - for i := range lastStreamResponse.Choices { - var content string - if lastStreamResponse.Choices[i].Delta.Content != nil { - content = *lastStreamResponse.Choices[i].Delta.Content - } - newContent := "<think>\n" + content - lastStreamResponse.Choices[i].Delta.Content = &newContent - } - } + if addThink { + for i := range lastStreamResponse.Choices { + var content string + if lastStreamResponse.Choices[i].Delta.Content != nil { + content = *lastStreamResponse.Choices[i].Delta.Content + } + trimmed := strings.TrimLeft(content, " \t\r\n") + if !strings.HasPrefix(trimmed, "<think>") { + newContent := "<think>\n" + content + lastStreamResponse.Choices[i].Delta.Content = &newContent + } + } + }
139-146
: Simplify boolean assignment.Minor readability improvement: assign directly from the setting.
- var addThinkFirst bool - - if info.ChannelSetting.AddThinkFirst { - addThinkFirst = true - } + addThinkFirst := info.ChannelSetting.AddThinkFirst var addThink bool var firstChunk = true
219-246
: Non-streaming: make addThinkFirst idempotent and avoid clobbering when already present.Also simplify the boolean assignment.
- addThinkFirst := false - if info.ChannelSetting.AddThinkFirst { - addThinkFirst = true - } + addThinkFirst := info.ChannelSetting.AddThinkFirst @@ - if addThinkFirst { - for i := range simpleResponse.Choices { - newContent := "<think>\n" + simpleResponse.Choices[i].Message.StringContent() - simpleResponse.Choices[i].Message.Content = &newContent - } - responseBody, err = common.Marshal(simpleResponse) - if err != nil { - return nil, types.NewError(err, types.ErrorCodeBadResponseBody) - } - } + if addThinkFirst { + changed := false + for i := range simpleResponse.Choices { + curr := simpleResponse.Choices[i].Message.StringContent() + if !strings.HasPrefix(strings.TrimLeft(curr, " \t\r\n"), "<think>") { + newContent := "<think>\n" + curr + simpleResponse.Choices[i].Message.Content = &newContent + changed = true + } + } + if changed { + responseBody, err = common.Marshal(simpleResponse) + if err != nil { + return nil, types.NewError(err, types.ErrorCodeBadResponseBody) + } + } + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (8)
docs/channel/other_setting.md
(1 hunks)dto/channel_settings.go
(1 hunks)i18n/zh-cn.json
(1 hunks)relay/channel/gemini/relay-gemini.go
(1 hunks)relay/channel/openai/helper.go
(1 hunks)relay/channel/openai/relay-openai.go
(6 hunks)web/src/components/table/channels/modals/EditChannelModal.jsx
(10 hunks)web/src/i18n/locales/en.json
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-21T03:37:41.726Z
Learnt from: 9Ninety
PR: QuantumNous/new-api#1273
File: relay/channel/gemini/relay-gemini.go:97-116
Timestamp: 2025-06-21T03:37:41.726Z
Learning: In relay/channel/gemini/relay-gemini.go, the thinking budget calculation logic (including the MaxOutputTokens multiplication) was introduced in PR #1247. PR #1273 focused specifically on decoupling the thoughts summary feature from thinking budget settings and did not modify the existing thinking budget behavior.
Applied to files:
relay/channel/openai/relay-openai.go
🧬 Code graph analysis (2)
relay/channel/openai/helper.go (2)
relay/common/relay_info.go (1)
RelayInfo
(73-118)types/relay_format.go (2)
RelayFormat
(3-3)RelayFormatOpenAI
(6-6)
relay/channel/openai/relay-openai.go (3)
relay/helper/stream_scanner.go (1)
StreamScannerHandler
(29-262)relay/channel/openai/helper.go (1)
HandleStreamFormat
(21-33)dto/openai_request.go (1)
Message
(247-258)
🔇 Additional comments (8)
i18n/zh-cn.json (1)
601-602
: LGTM: zh-CN entries for think-firstThe new entries are properly escaped (
\\n
) and consistent with the existing i18n style.relay/channel/gemini/relay-gemini.go (1)
876-876
: All HandleStreamFormat call sites include the new boolean flagVerified that both OpenAI and Gemini streams invoke
HandleStreamFormat
with six arguments, matching the updated signature and correctly passingaddThink
for OpenAI andfalse
for Gemini.• relay/channel/openai/relay-openai.go:151 –
HandleStreamFormat(…, addThink)
• relay/channel/gemini/relay-gemini.go:876 –HandleStreamFormat(…, false)
• relay/channel/openai/helper.go:21 – Signature definesaddThink bool
No further changes required.
relay/channel/openai/helper.go (1)
26-26
: I see theaddThink
branch unconditionally prepends<think>\n
on the first chunk, without guarding against existing tags. We still need to review thethinkToContent
branch to confirm that it merges thinking tags and content in the expected order, and doesn’t introduce duplicates when both settings are active.Let’s inspect the remainder of
sendStreamData
(thethinkToContent
handling) to verify:
- That existing
<think>
markers aren’t re-added.- That content and thinking segments are ordered correctly.
Once we confirm that, we can close out this check or flag any required fixes.
web/src/components/table/channels/modals/EditChannelModal.jsx (4)
135-137
: Default value wired correctly.add_think_first is added to originInputs with a sensible default (false). No issues.
163-171
: Initialize channelSettings with add_think_first.Good alignment between form state and persisted setting keys.
342-361
: Robust defaulting looks fine.Parsing of data.setting and fallbacks to false/'' are OK and consistent for add_think_first.
739-759
: Persisted JSON includes add_think_first and excludes transient fields.Shape matches backend DTO expectations (json:"add_think_first,omitempty"). LGTM.
relay/channel/openai/relay-openai.go (1)
29-37
: Early passthrough guard updated correctly.Including addThink in the guard ensures we intercept only when needed. Looks good.
if info.RelayFormat == types.RelayFormatOpenAI { | ||
if shouldSendLastResp { | ||
_ = sendStreamData(c, info, lastStreamData, info.ChannelSetting.ForceFormat, info.ChannelSetting.ThinkingToContent) | ||
_ = sendStreamData(c, info, lastStreamData, info.ChannelSetting.ForceFormat, info.ChannelSetting.ThinkingToContent, false) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single-chunk streaming bug: prefix not applied to the only chunk.
When the stream consists of a single data chunk, the first (and only) chunk is flushed in the “last response” block with addThink=false, so “\n” is never prepended. Fix by reusing the same firstChunk&&addThinkFirst logic for the final flush.
- if shouldSendLastResp {
- _ = sendStreamData(c, info, lastStreamData, info.ChannelSetting.ForceFormat, info.ChannelSetting.ThinkingToContent, false)
- }
+ if shouldSendLastResp {
+ // Apply think-first if the last chunk is also the first (single-chunk stream)
+ _ = sendStreamData(
+ c,
+ info,
+ lastStreamData,
+ info.ChannelSetting.ForceFormat,
+ info.ChannelSetting.ThinkingToContent,
+ addThinkFirst && firstChunk,
+ )
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if info.RelayFormat == types.RelayFormatOpenAI { | |
if shouldSendLastResp { | |
_ = sendStreamData(c, info, lastStreamData, info.ChannelSetting.ForceFormat, info.ChannelSetting.ThinkingToContent) | |
_ = sendStreamData(c, info, lastStreamData, info.ChannelSetting.ForceFormat, info.ChannelSetting.ThinkingToContent, false) | |
} | |
} | |
if info.RelayFormat == types.RelayFormatOpenAI { | |
if shouldSendLastResp { | |
// Apply think-first if the last chunk is also the first (single-chunk stream) | |
_ = sendStreamData( | |
c, | |
info, | |
lastStreamData, | |
info.ChannelSetting.ForceFormat, | |
info.ChannelSetting.ThinkingToContent, | |
addThinkFirst && firstChunk, | |
) | |
} | |
} |
🤖 Prompt for AI Agents
In relay/channel/openai/relay-openai.go around lines 170 to 174, the final flush
path uses addThink=false so when a stream contains a single chunk the Thinking
prefix is never applied; change the final sendStreamData call to reuse the same
firstChunk && addThinkFirst logic used for earlier chunks (i.e. compute
addThinkFirst = firstChunk && info.ChannelSetting.ThinkingToContent and pass
that boolean into sendStreamData on the last-response branch), and ensure
firstChunk is updated/treated consistently so the only chunk receives the
Thinking prefix when appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last chunk doesn't need <think>\n
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiajunly Thank you for the clarification! You're absolutely right.
Looking at the streaming logic more carefully, I can see that the <think>\n
prefix is only intended for the very first chunk of the stream, and the final flush with addThink=false
is correct by design. Even in single-chunk scenarios, the prefix would have been applied during the normal scanner processing loop, not during the final response handling.
My analysis was incorrect - the current implementation properly ensures only the first chunk gets the prefix, and the last chunk (whether it's part of a multi-chunk or single-chunk stream) correctly doesn't receive the prefix.
✏️ Learnings added
Learnt from: jiajunly
PR: QuantumNous/new-api#1629
File: relay/channel/openai/relay-openai.go:170-174
Timestamp: 2025-08-21T06:31:11.045Z
Learning: In relay/channel/openai/relay-openai.go, the streaming logic for the AddThinkFirst feature is designed so that only the first chunk of a stream gets the "<think>\n" prefix. The final flush in the streaming handler intentionally uses addThink=false because the last chunk should never receive the prefix, even in single-chunk streams where the prefix would have been applied during normal processing.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: 9Ninety
PR: QuantumNous/new-api#1273
File: relay/channel/gemini/relay-gemini.go:97-116
Timestamp: 2025-06-21T03:37:41.726Z
Learning: In relay/channel/gemini/relay-gemini.go, the thinking budget calculation logic (including the MaxOutputTokens multiplication) was introduced in PR #1247. PR #1273 focused specifically on decoupling the thoughts summary feature from thinking budget settings and did not modify the existing thinking budget behavior.
fix #1628
Summary by CodeRabbit
New Features
Documentation
Chores