Skip to content

Conversation

sunlei1024
Copy link
Collaborator

Summary

本次 PR 新增了对 Pooling Embedding 模型 的支持,并提供 与 OpenAI /v1/embeddings 完全兼容的接口实现
该功能旨在满足用户对高性能语义嵌入(Sentence Embedding)的需求,为搜索、聚类、推荐等下游任务提供更优质的嵌入表示。


主要更新内容

1. 新增:Pooling Model Embedding 支持

  • New Feature: 实现了对 pooling 模型 的底层接口支持。
    该功能允许服务将输入序列(如文本的词向量)通过聚合操作(如平均池化)转换为固定维度的语义嵌入向量
  • Value: 扩展了系统的模型类型支持,使用户能够利用高性能的语义模型生成高质量的句子嵌入,广泛适用于搜索、聚类与语义检索等场景。

2. 新增:OpenAI 兼容接口实现

  • New Feature: 实现了 与 OpenAI /v1/embeddings 标准完全兼容的 API 接口

  • 兼容请求格式: 接口支持两种主流的请求类型,确保与现有 OpenAI 客户端无缝对接:

    1. EmbeddingCompletionRequest — 接收 input 字符串或字符串列表。
    2. EmbeddingChatRequest — 接收 messages 列表,用于聊天类上下文嵌入。

测试方式 (cURL 示例)

A. EmbeddingCompletionRequest 示例(标准文本输入)

curl -X POST 'YOUR_SERVICE_URL/v1/embeddings' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "text-embedding-chat-model",
    "input": [
      "This is a sentence for pooling embedding.",
      "Another input text."
    ],
    "user": "test_client"
  }'

B. EmbeddingChatRequest 示例(消息序列输入)

curl -X POST 'YOUR_SERVICE_URL/v1/embeddings' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "text-embedding-chat-model",
    "messages": [
      {"role": "user", "content": "Generate embedding for user query."}
    ]
  }'

响应参数说明

以下为标准的接口响应格式,兼容 OpenAI 的 /v1/embeddings 输出规范,同时支持多样化的 embedding 数据结构:

{
  "id": "embed-550e8400-e29b-41d4-a716-446655440000",
  "object": "list",
  "created": 1693645123,
  "model": "text-embedding-chat-model",
  "data": [
    { // 示例 1:单层 embedding 向量
      "index": 0,
      "object": "embedding",
      "embedding": [0.0123, -0.0456, 0.0789, 0.1011, -0.2022]
    },
    { // 示例 2:多层嵌套 embedding(如 token 级输出)
      "index": 1,
      "object": "embedding",
      "embedding": [
        [0.001, 0.002, 0.003],
        [0.004, 0.005, 0.006]
      ]
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "total_tokens": 42
  }
}

字段说明:

  • id:请求唯一标识(带前缀 pool-
  • object:响应对象类型,固定为 "list"
  • created:请求创建时间(Unix 时间戳)
  • model:使用的嵌入模型名称
  • data:嵌入结果数组,包含一个或多个 embedding 对象
    • index:输入序列对应的索引
    • embedding:嵌入向量(支持一维或二维结构)
  • usage:请求的 Token 使用统计

Copy link

paddle-bot bot commented Oct 10, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Oct 10, 2025
@lizexu123
Copy link
Collaborator

lizexu123 commented Oct 11, 2025

上面的使用demo再加一个

from openai import OpenAI

Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://0.0.0.0:8060/v1"


def main():
    client = OpenAI(
        # defaults to os.environ.get("OPENAI_API_KEY")
        api_key=openai_api_key,
        base_url=openai_api_base,
    )

    models = client.models.list()
    model = models.data[0].id

    responses = client.embeddings.create(
        # ruff: noqa: E501
        input=[
            # "北京天安门在哪里?",
            "what is your name",
        ],
        model=model,
    )

    for data in responses.data:
        print(data.embedding)  # List of float of len 4096 这种使用demo吧

@sunlei1024 sunlei1024 force-pushed the feat/pooling_embedding branch from 71a40a7 to abfc268 Compare October 11, 2025 08:39
@CLAassistant
Copy link

CLAassistant commented Oct 11, 2025

CLA assistant check
All committers have signed the CLA.

@sunlei1024 sunlei1024 force-pushed the feat/pooling_embedding branch from 76c5782 to 3f9d216 Compare October 11, 2025 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants