Feature：Add support for Pooling Model Embedding and provide an OpenAI-compatible API. #4344

sunlei1024 · 2025-10-10T06:35:35Z

Summary

本次 PR 新增了对 Pooling Embedding 模型 的支持，并提供 与 OpenAI /v1/embeddings 完全兼容的接口实现。
该功能旨在满足用户对高性能语义嵌入（Sentence Embedding）的需求，为搜索、聚类、推荐等下游任务提供更优质的嵌入表示。

主要更新内容

1. 新增：Pooling Model Embedding 支持

New Feature: 实现了对 pooling 模型 的底层接口支持。
该功能允许服务将输入序列（如文本的词向量）通过聚合操作（如平均池化）转换为固定维度的语义嵌入向量。
Value: 扩展了系统的模型类型支持，使用户能够利用高性能的语义模型生成高质量的句子嵌入，广泛适用于搜索、聚类与语义检索等场景。

2. 新增：OpenAI 兼容接口实现

New Feature: 实现了 与 OpenAI /v1/embeddings 标准完全兼容的 API 接口。
兼容请求格式： 接口支持两种主流的请求类型，确保与现有 OpenAI 客户端无缝对接：
1. EmbeddingCompletionRequest — 接收 input 字符串或字符串列表。
2. EmbeddingChatRequest — 接收 messages 列表，用于聊天类上下文嵌入。

测试方式 (cURL 示例)

A. EmbeddingCompletionRequest 示例（标准文本输入）

curl -X POST 'YOUR_SERVICE_URL/v1/embeddings' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "text-embedding-chat-model",
    "input": [
      "This is a sentence for pooling embedding.",
      "Another input text."
    ],
    "user": "test_client"
  }'

B. EmbeddingChatRequest 示例（消息序列输入）

curl -X POST 'YOUR_SERVICE_URL/v1/embeddings' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "text-embedding-chat-model",
    "messages": [
      {"role": "user", "content": "Generate embedding for user query."}
    ]
  }'

响应参数说明

以下为标准的接口响应格式，兼容 OpenAI 的 /v1/embeddings 输出规范，同时支持多样化的 embedding 数据结构：

{
  "id": "embed-550e8400-e29b-41d4-a716-446655440000",
  "object": "list",
  "created": 1693645123,
  "model": "text-embedding-chat-model",
  "data": [
    { // 示例 1：单层 embedding 向量
      "index": 0,
      "object": "embedding",
      "embedding": [0.0123, -0.0456, 0.0789, 0.1011, -0.2022]
    },
    { // 示例 2：多层嵌套 embedding（如 token 级输出）
      "index": 1,
      "object": "embedding",
      "embedding": [
        [0.001, 0.002, 0.003],
        [0.004, 0.005, 0.006]
      ]
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "total_tokens": 42
  }
}

字段说明：

id：请求唯一标识（带前缀 pool-）
object：响应对象类型，固定为 "list"
created：请求创建时间（Unix 时间戳）
model：使用的嵌入模型名称
data：嵌入结果数组，包含一个或多个 embedding 对象
- index：输入序列对应的索引
- embedding：嵌入向量（支持一维或二维结构）
usage：请求的 Token 使用统计

paddle-bot · 2025-10-10T06:35:40Z

Thanks for your contribution!

lizexu123 · 2025-10-11T06:39:28Z

上面的使用demo再加一个

from openai import OpenAI

Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://0.0.0.0:8060/v1"


def main():
    client = OpenAI(
        # defaults to os.environ.get("OPENAI_API_KEY")
        api_key=openai_api_key,
        base_url=openai_api_base,
    )

    models = client.models.list()
    model = models.data[0].id

    responses = client.embeddings.create(
        # ruff: noqa: E501
        input=[
            # "北京天安门在哪里?",
            "what is your name",
        ],
        model=model,
    )

    for data in responses.data:
        print(data.embedding)  # List of float of len 4096 这种使用demo吧

CLAassistant · 2025-10-11T10:55:18Z

All committers have signed the CLA.

sunlei1024 and others added 3 commits October 10, 2025 14:18

feat: add OpenAIServing

7a87fbc

feat: add ZmqOpenAIServing & OpenAIServingEmbedding

9e63f56

feat: Refine the basic ServingEngine class and introduce ServingContext

fb90036

paddle-bot bot added the contributor External developers label Oct 10, 2025

fix: codestyle

d82146f

fix: request

abfc268

sunlei1024 force-pushed the feat/pooling_embedding branch from 71a40a7 to abfc268 Compare October 11, 2025 08:39

sunlei1024 added 2 commits October 11, 2025 12:45

fix: pooling_params

96846f6

feat: _process_chat_template_kwargs

3f9d216

sunlei1024 force-pushed the feat/pooling_embedding branch from 76c5782 to 3f9d216 Compare October 11, 2025 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature：Add support for Pooling Model Embedding and provide an OpenAI-compatible API. #4344

Feature：Add support for Pooling Model Embedding and provide an OpenAI-compatible API. #4344

sunlei1024 commented Oct 10, 2025

Uh oh!

paddle-bot bot commented Oct 10, 2025

Uh oh!

lizexu123 commented Oct 11, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Oct 11, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feature：Add support for Pooling Model Embedding and provide an OpenAI-compatible API. #4344

Are you sure you want to change the base?

Feature：Add support for Pooling Model Embedding and provide an OpenAI-compatible API. #4344

Conversation

sunlei1024 commented Oct 10, 2025

Summary

主要更新内容

1. 新增：Pooling Model Embedding 支持

2. 新增：OpenAI 兼容接口实现

测试方式 (cURL 示例)

A. EmbeddingCompletionRequest 示例（标准文本输入）

B. EmbeddingChatRequest 示例（消息序列输入）

响应参数说明

Uh oh!

paddle-bot bot commented Oct 10, 2025

Uh oh!

lizexu123 commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lizexu123 commented Oct 11, 2025 •

edited

Loading

CLAassistant commented Oct 11, 2025 •

edited

Loading