Embedding - Alibaba Cloud Model Studio - Alibaba Cloud Documentation Center

Embedding models convert text, image, and video data into numerical vectors. These vectors are used for tasks such as semantic search, recommendation, clustering, classification, and anomaly detection.

Preparations

You must obtain an API key and set the API key as an environment variable. If you use the OpenAI SDK or DashScope SDK to make calls, you must also install the SDK.

How to get embedding

Text embedding

Send a request to the API endpoint with the text and the model name, such as `text-embedding-v4`.

OpenAI compatible

Python

import os
from openai import OpenAI

input_texts = "The quality of the clothes is excellent"

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),  # If you have not configured an environment variable, replace this with your API key.
    # The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with https://dashscope.aliyuncs.com/compatible-mode/v1.
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.embeddings.create(
    model="text-embedding-v4",
    input=input_texts
)

print(completion.model_dump_json())

Node.js

const OpenAI = require("openai");

// Initialize the OpenAI client.
const openai = new OpenAI({
    // Make sure you have correctly set the DASHSCOPE_API_KEY environment variable.
    apiKey: process.env.DASHSCOPE_API_KEY, // If you have not configured an environment variable, replace this with your API key.
    // The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with https://dashscope.aliyuncs.com/compatible-mode/v1.
    baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});

async function getEmbedding() {
    try {
        const inputTexts = "The quality of the clothes is excellent";
        const completion = await openai.embeddings.create({
            model: "text-embedding-v4",
            input: inputTexts,
            dimensions: 1024 // Specify the vector dimensions. This parameter is supported only by text-embedding-v3 and text-embedding-v4.
        });

        console.log(JSON.stringify(completion, null, 2));
    } catch (error) {
        console.error('Error:', error);
    }
}

getEmbedding();

curl

curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/embeddings' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "text-embedding-v4",
    "input": "The quality of the clothes is excellent"
}'

DashScope

Python

import dashscope
from http import HTTPStatus

# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

input_texts = "The quality of the clothes is excellent"
resp = dashscope.TextEmbedding.call(
    model="text-embedding-v4",
    input=input_texts,
)

if resp.status_code == HTTPStatus.OK:
    print(resp)

Java

import com.alibaba.dashscope.embeddings.TextEmbedding;
import com.alibaba.dashscope.embeddings.TextEmbeddingParam;
import com.alibaba.dashscope.embeddings.TextEmbeddingResult;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.Constants;

import java.util.Collections;
public class Main {
    static {
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
        // If you use a model in the China (Beijing) region, replace this with https://dashscope.aliyuncs.com/api/v1.
    }
     public static void main(String[] args) {
        String inputTexts = "The quality of the clothes is excellent";
        try {
            // Build the request parameters.
            TextEmbeddingParam param = TextEmbeddingParam
                    .builder()
                    .model("text-embedding-v4")
                    // Input text.
                    .texts(Collections.singleton(inputTexts))
                    .build();

            // Create a model instance and call it.
            TextEmbedding textEmbedding = new TextEmbedding();
            TextEmbeddingResult result = textEmbedding.call(param);

            // Print the result.
            System.out.println(result);

        } catch (NoApiKeyException e) {
            // Catch and handle exceptions for an unset API key.
            System.err.println("An exception occurred when calling the API: " + e.getMessage());
            System.err.println("Check that your API key is configured correctly.");
            e.printStackTrace();
        }
    }
}

curl

# ======= Important Note =======
# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1/services/embeddings/text-embedding/text-embedding.
# === Remove this comment before execution ====

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/embeddings/text-embedding/text-embedding' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "text-embedding-v4",
    "input": {
        "texts": [
        "The quality of the clothes is excellent"
        ]
    }
}'

Multimodal embedding (Available only in the China (Beijing) region)

Currently, multimodal embedding is supported only by calling the multimodal-embedding-v1 model through the DashScope SDK and API.

import dashscope
import json
from http import HTTPStatus

# The input can be a video.
# video = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250107/lbcemt/new+video.mp4"
# input = [{'video': video}]
# Or an image.
image = "https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png"
input = [{'image': image}]
resp = dashscope.MultiModalEmbedding.call(
    model="multimodal-embedding-v1",
    input=input
)

print(json.dumps(resp.output, indent=4))

Model selection

The appropriate model depends on your input data type and application scenario.

Plain text or code: Use text-embedding-v4. This is the highest-performing model currently available and includes advanced features such as instruct and sparse vectors. This model is suitable for most text processing scenarios.
Processing multimodal content: For content that contains a mix of images, text, or videos, you can choose the general-purpose multimodal model multimodal-embedding-v1.
Large-scale data: To process large-scale, non-real-time text data, we recommend using text-embedding-v4 through OpenAI-compatible batch calls to significantly reduce costs.

The following table lists the specifications for all available embedding models.

Text embedding

Singapore

Model

Embedding dimensions

Batch size

Maximum tokens per line (Note)

Price (Million input tokens)

Supported languages

Free quota (Note)

text-embedding-v4

Part of the Qwen3-Embedding series

2,048, 1,536, 1,024 (default), 768, 512, 256, 128, 64

8,192

$0.07

Over 100 major languages, including Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian

1 million tokens

Validity: 90 days after Model Studio activation

text-embedding-v3

1,024 (default), 768, 512

Over 50 major languages, including Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian

500,000 tokens

Validity: 90 days after Model Studio activation

Beijing

Model

Embedding dimensions

Batch size

Maximum tokens per line

Price (Million input tokens)

Supported languages

text-embedding-v4

Part of the Qwen3-Embedding series

2,048, 1,536, 1,024 (default), 768, 512, 256, 128, 64

8,192

$0.072

Over 100 major languages, including Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian, and various programming languages

Note

The batch size is the maximum number of text inputs that can be processed in a single API call. For example, the batch size for text-embedding-v4 is 10. This means you can include up to 10 text inputs in a single embedding request. This limit applies to:

String array input: The array can contain up to 10 elements.
File input: The text file can contain up to 10 lines of text.

Multimodal embedding

The model generates continuous vectors from user inputs, such as text, images, or videos. This feature is suitable for scenarios such as video classification, image classification, image-text retrieval, and searching for images or videos using text or other images. Available only in the China (Beijing) region.

Model

Embedding dimensions

Text length limit

Image/video size limit

Price (1,000 input tokens)

multimodal-embedding-v1

Available only in the China (Beijing) region

1,024

512 tokens

Image size: ≤ 3 MB, Quantity: 1
Video: ≤ 10 MB

Free trial, with no token quota limit.

The following input language and format limits apply to the general-purpose multimodal embedding API:

Input type	Language/Format limit
Text	Chinese/English
Image	JPG, PNG, and BMP. Supports input in Base64 format or as a URL.
Multiple images
Video	MP4, MPEG, MPG, WEBM, AVI, FLV, MKV, MOV

The API supports uploading a single text segment, image, or video file. You can also combine different types, such as text and an image. However, only one combination is allowed per call. Each content type can be included a maximum of once in the request, and the file must meet the specified length or size requirements.

Core features

Switch vector dimensions

text-embedding-v4 and text-embedding-v3 support custom vector dimensions. Higher dimensions retain richer semantic information, but they also increase storage and computation costs.

General scenarios (Recommended): A dimension of 1024 offers the best balance between performance and cost. It is suitable for most semantic retrieval tasks.
High-precision scenarios: For fields that require high precision, you can choose a dimension of 1536 or 2048. This provides a slight precision improvement but significantly increases storage and computation overhead.
Resource-constrained scenarios: In highly cost-sensitive scenarios, you can choose a dimension of 768 or lower. This significantly reduces resource consumption but results in some loss of semantic information.

OpenAI compatible

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with https://dashscope.aliyuncs.com/compatible-mode/v1.
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

resp = client.embeddings.create(
    model="text-embedding-v4",
    input=["I like it and will come back to shop here again."],
    # Set the vector dimension to 256.
    dimensions=256
)
print(f"Vector dimension: {len(resp.data[0].embedding)}")

DashScope

import dashscope

# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

resp = dashscope.TextEmbedding.call(
    model="text-embedding-v4",
    input=["I like it and will come back to shop here again."],
    # Set the vector dimension to 256.
    dimension=256
)

print(f"Vector dimension: {len(resp.output['embeddings'][0]['embedding'])}")

query and document (text_type)

Only the DashScope SDK and API support this parameter.

For optimal search performance, differentiate between queries and documents based on the task using the text_type parameter:

text_type: 'query': For user-input query text. The model generates a "title-like" vector that is more directional and is optimized for asking questions and searching.
text_type: 'document' (default): For document text stored in a database. The model generates a "body-like" vector that contains more comprehensive information and is optimized for matching.

Distinguish between query and document when matching short text against long text. In tasks such as clustering or classification where all text inputs have the same role, you do not need to set this parameter.

Improve performance with instruct

Only the DashScope SDK and API support this parameter.

By providing a clear English instruction (instruct), you can guide the text-embedding-v4 model to optimize vector quality for specific retrieval scenarios, which improves accuracy. When using this feature, you must set the text_type parameter to query.

# Scenario: When building document vectors for a search engine, add an instruction to optimize the vector quality for retrieval.
resp = dashscope.TextEmbedding.call(
    model="text-embedding-v4",
    input="Research papers on machine learning",
    text_type="query",
    instruct="Given a research paper query, retrieve relevant research paper"
)

Dense and sparse vectors

Only the DashScope SDK and API support this parameter.

text-embedding-v4 and text-embedding-v3 can output three types of vectors to suit different retrieval strategies.

Vector type (output_type)	Core advantage	Main disadvantage	Typical application scenario
dense	Deep semantic understanding. It can identify synonyms and context, making retrieval results more relevant.	Higher computation and storage costs. It cannot guarantee an exact keyword match.	Semantic search, AI chat, content recommendation.
sparse	High computational efficiency. It focuses on an exact match of keywords and enables fast filtering.	Lacks semantic understanding. It cannot handle synonyms or context.	Log retrieval, product SKU search, precise information filtering.
dense&sparse	Combines semantics and keywords for optimal search results. The generation cost is the same as the former two types, with no additional API call overhead.	Large storage requirements. It requires a more complex system architecture and retrieval logic.	High-quality, production-grade hybrid search engines.

Use cases

The following code is for demonstration purposes only. In a production environment, you should compute vectors in advance and store them in a vector store. During retrieval, you only need to compute the vector for the query.

Semantic search

You can achieve precise semantic matching by calculating the vector similarity between the query and documents.

import dashscope
import numpy as np
from dashscope import TextEmbedding

# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

def cosine_similarity(a, b):
    """Calculate cosine similarity"""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def semantic_search(query, documents, top_k=5):
    """Semantic search"""
    # Generate query vector
    query_resp = TextEmbedding.call(
        model="text-embedding-v4",
        input=query,
        dimension=1024
    )
    query_embedding = query_resp.output['embeddings'][0]['embedding']

    # Generate document vectors
    doc_resp = TextEmbedding.call(
        model="text-embedding-v4",
        input=documents,
        dimension=1024
    )

    # Calculate similarity
    similarities = []
    for i, doc_emb in enumerate(doc_resp.output['embeddings']):
        similarity = cosine_similarity(query_embedding, doc_emb['embedding'])
        similarities.append((i, similarity))

    # Sort and return top_k results
    similarities.sort(key=lambda x: x[1], reverse=True)
    return [(documents[i], sim) for i, sim in similarities[:top_k]]

# Example usage
documents = [
    "Artificial intelligence is a branch of computer science",
    "Machine learning is an important method for achieving artificial intelligence",
    "Deep learning is a subfield of machine learning"
]
query = "What is AI?"
results = semantic_search(query, documents, top_k=2)
for doc, sim in results:
    print(f"Similarity: {sim:.3f}, Document: {doc}")

Recommendation system

You can analyze user history vectors to discover user preferences and recommend similar items.

import dashscope
import numpy as np
from dashscope import TextEmbedding

# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

def cosine_similarity(a, b):
    """Calculate cosine similarity"""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def build_recommendation_system(user_history, all_items, top_k=10):
    """Build a recommendation system"""
    # Generate user history vectors
    history_resp = TextEmbedding.call(
        model="text-embedding-v4",
        input=user_history,
        dimension=1024
    )

    # Calculate user preference vector (by averaging)
    user_embedding = np.mean([
        emb['embedding'] for emb in history_resp.output['embeddings']
    ], axis=0)

    # Generate all item vectors
    items_resp = TextEmbedding.call(
        model="text-embedding-v4",
        input=all_items,
        dimension=1024
    )

    # Calculate recommendation scores
    recommendations = []
    for i, item_emb in enumerate(items_resp.output['embeddings']):
        score = cosine_similarity(user_embedding, item_emb['embedding'])
        recommendations.append((all_items[i], score))

    # Sort and return recommendation results
    recommendations.sort(key=lambda x: x[1], reverse=True)
    return recommendations[:top_k]

# Example usage
user_history = ["Science Fiction", "Action", "Suspense"]
all_movies = ["Future World", "Space Adventure", "Ancient Warfare", "Romantic Journey", "Superhero"]
recommendations = build_recommendation_system(user_history, all_movies)
for movie, score in recommendations:
    print(f"Recommendation Score: {score:.3f}, Movie: {movie}")

Text clustering

You can automatically group similar texts by analyzing the distances between their vectors.

# You need to install scikit-learn: pip install scikit-learn
import dashscope
import numpy as np
from sklearn.cluster import KMeans

# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

def cluster_texts(texts, n_clusters=2):
    """Cluster a set of texts"""
    # 1. Get the vectors for all texts.
    resp = dashscope.TextEmbedding.call(
        model="text-embedding-v4",
        input=texts,
        dimension=1024
    )
    embeddings = np.array([item['embedding'] for item in resp.output['embeddings']])

    # 2. Use the KMeans algorithm for clustering.
    kmeans = KMeans(n_clusters=n_clusters, random_state=0, n_init='auto').fit(embeddings)

    # 3. Organize and return the results.
    clusters = {i: [] for i in range(n_clusters)}
    for i, label in enumerate(kmeans.labels_):
        clusters[label].append(texts[i])
    return clusters


# Example usage
documents_to_cluster = [
    "Phone company A releases a new phone",
    "Search engine company B launches a new system",
    "World Cup final: Argentina vs. France",
    "Chinese team wins another gold at the Olympics",
    "A company releases its latest AI chip",
    "European Championship match report"
]
clusters = cluster_texts(documents_to_cluster, n_clusters=2)
for cluster_id, docs in clusters.items():
    print(f"--- Category {cluster_id} ---")
    for doc in docs:
        print(f"- {doc}")

Text classification

By calculating the vector similarity between an input text and predefined labels, you can identify and classify new categories without pre-labeled examples.

import dashscope
import numpy as np

# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

def cosine_similarity(a, b):
    """Calculate cosine similarity"""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))


def classify_text_zero_shot(text, labels):
    """Zero-shot text classification"""
    # 1. Get the vectors for the input text and all labels.
    resp = dashscope.TextEmbedding.call(
        model="text-embedding-v4",
        input=[text] + labels,
        dimension=1024
    )
    embeddings = resp.output['embeddings']
    text_embedding = embeddings[0]['embedding']
    label_embeddings = [emb['embedding'] for emb in embeddings[1:]]

    # 2. Calculate the similarity with each label.
    scores = [cosine_similarity(text_embedding, label_emb) for label_emb in label_embeddings]

    # 3. Return the label with the highest similarity.
    best_match_index = np.argmax(scores)
    return labels[best_match_index], scores[best_match_index]


# Example usage
text_to_classify = "The fabric of this dress is comfortable, and the style is nice"
possible_labels = ["Digital Products", "Apparel & Accessories", "Food & Beverage", "Home & Living"]

label, score = classify_text_zero_shot(text_to_classify, possible_labels)
print(f"Input text: '{text_to_classify}'")
print(f"The best matching category is: '{label}' (Similarity: {score:.3f})")

Anomaly detection

You can identify anomalous data that significantly differs from normal patterns by calculating the vector similarity between a text vector and the center of normal sample vectors.

import dashscope
import numpy as np


def cosine_similarity(a, b):
    """Calculate cosine similarity"""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))


def detect_anomaly(new_comment, normal_comments, threshold=0.6):
    # 1. Vectorize all normal comments and the new comment.
    all_texts = normal_comments + [new_comment]
    resp = dashscope.TextEmbedding.call(
        model="text-embedding-v4",
        input=all_texts,
        dimension=1024
    )
    embeddings = [item['embedding'] for item in resp.output['embeddings']]

    # 2. Calculate the center vector (average) of the normal comments.
    normal_embeddings = np.array(embeddings[:-1])
    normal_center_vector = np.mean(normal_embeddings, axis=0)

    # 3. Calculate the similarity between the new comment and the center vector.
    new_comment_embedding = np.array(embeddings[-1])
    similarity = cosine_similarity(new_comment_embedding, normal_center_vector)

    # 4. Determine if it is an anomaly.
    is_anomaly = similarity < threshold
    return is_anomaly, similarity


# Example usage
normal_user_comments = [
    "Today's meeting was very productive",
    "The project is progressing smoothly",
    "New version to be released next week",
    "Good user feedback"
]

test_comments = {
    "Normal comment": "The feature works as expected",
    "Anomaly - meaningless garbled text": "asdfghjkl zxcvbnm"
}

print("--- Anomaly Detection Example ---")
for desc, comment in test_comments.items():
    is_anomaly, score = detect_anomaly(comment, normal_user_comments)
    result = "Yes" if is_anomaly else "No"
    print(f"Comment: '{comment}'")
    print(f"Is it an anomaly: {result} (Similarity to normal samples: {score:.3f})\n")

API reference

General-purpose text embedding
- Synchronous API
Multimodal embedding
Multimodal embedding API

Error codes

If a call fails, see Error messages for troubleshooting.

Rate limits

See Rate limits.

Model performance (MTEB/CMTEB)

Evaluation benchmarks

MTEB: Massive Text Embedding Benchmark. This benchmark evaluates model performance across various tasks, such as classification, clustering, and retrieval.
CMTEB: Chinese Massive Text Embedding Benchmark. This benchmark is designed specifically for evaluating Chinese text.
Scores range from 0 to 100, where a higher value indicates better performance.

Model	MTEB	MTEB (Retrieval task)	CMTEB	CMTEB (Retrieval task)
text-embedding-v3 (512 dimensions)	62.11	54.30	66.81	71.88
text-embedding-v3 (768 dimensions)	62.43	54.74	67.90	72.29
text-embedding-v3 (1024 dimensions)	63.39	55.41	68.92	73.23
text-embedding-v4 (512 dimensions)	64.73	56.34	68.79	73.33
text-embedding-v4 (1024 dimensions)	68.36	59.30	70.14	73.98
text-embedding-v4 (2048 dimensions)	71.58	61.97	71.99	75.01

Preparations

How to get embedding

Text embedding

OpenAI compatible

DashScope

Multimodal embedding (Available only in the China (Beijing) region)

Model selection

Text embedding

Singapore

Beijing

Multimodal embedding

Core features

Switch vector dimensions

OpenAI compatible

DashScope

query and document (text_type)

Improve performance with instruct

Dense and sparse vectors

Use cases

Semantic search

Recommendation system

Text clustering

Text classification

Anomaly detection

API reference

General-purpose text embedding

Multimodal embedding

Error codes

Rate limits

Model performance (MTEB/CMTEB)

Evaluation benchmarks