Embedding models convert text, image, and video data into numerical vectors. These vectors are used for tasks such as semantic search, recommendation, clustering, classification, and anomaly detection.
Preparations
You must obtain an API key and set the API key as an environment variable. If you use the OpenAI SDK or DashScope SDK to make calls, you must also install the SDK.
How to get embedding
Text embedding
Send a request to the API endpoint with the text and the model name, such as `text-embedding-v4`.
OpenAI compatible
import os
from openai import OpenAI
input_texts = "The quality of the clothes is excellent"
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"), # If you have not configured an environment variable, replace this with your API key.
# The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with https://dashscope.aliyuncs.com/compatible-mode/v1.
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.embeddings.create(
model="text-embedding-v4",
input=input_texts
)
print(completion.model_dump_json())
const OpenAI = require("openai");
// Initialize the OpenAI client.
const openai = new OpenAI({
// Make sure you have correctly set the DASHSCOPE_API_KEY environment variable.
apiKey: process.env.DASHSCOPE_API_KEY, // If you have not configured an environment variable, replace this with your API key.
// The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with https://dashscope.aliyuncs.com/compatible-mode/v1.
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});
async function getEmbedding() {
try {
const inputTexts = "The quality of the clothes is excellent";
const completion = await openai.embeddings.create({
model: "text-embedding-v4",
input: inputTexts,
dimensions: 1024 // Specify the vector dimensions. This parameter is supported only by text-embedding-v3 and text-embedding-v4.
});
console.log(JSON.stringify(completion, null, 2));
} catch (error) {
console.error('Error:', error);
}
}
getEmbedding();
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/embeddings' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "text-embedding-v4",
"input": "The quality of the clothes is excellent"
}'
DashScope
import dashscope
from http import HTTPStatus
# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
input_texts = "The quality of the clothes is excellent"
resp = dashscope.TextEmbedding.call(
model="text-embedding-v4",
input=input_texts,
)
if resp.status_code == HTTPStatus.OK:
print(resp)
import com.alibaba.dashscope.embeddings.TextEmbedding;
import com.alibaba.dashscope.embeddings.TextEmbeddingParam;
import com.alibaba.dashscope.embeddings.TextEmbeddingResult;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.Constants;
import java.util.Collections;
public class Main {
static {
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
// If you use a model in the China (Beijing) region, replace this with https://dashscope.aliyuncs.com/api/v1.
}
public static void main(String[] args) {
String inputTexts = "The quality of the clothes is excellent";
try {
// Build the request parameters.
TextEmbeddingParam param = TextEmbeddingParam
.builder()
.model("text-embedding-v4")
// Input text.
.texts(Collections.singleton(inputTexts))
.build();
// Create a model instance and call it.
TextEmbedding textEmbedding = new TextEmbedding();
TextEmbeddingResult result = textEmbedding.call(param);
// Print the result.
System.out.println(result);
} catch (NoApiKeyException e) {
// Catch and handle exceptions for an unset API key.
System.err.println("An exception occurred when calling the API: " + e.getMessage());
System.err.println("Check that your API key is configured correctly.");
e.printStackTrace();
}
}
}
# ======= Important Note =======
# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1/services/embeddings/text-embedding/text-embedding.
# === Remove this comment before execution ====
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/embeddings/text-embedding/text-embedding' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "text-embedding-v4",
"input": {
"texts": [
"The quality of the clothes is excellent"
]
}
}'
Multimodal embedding (Available only in the China (Beijing) region)
Currently, multimodal embedding is supported only by calling the multimodal-embedding-v1
model through the DashScope SDK and API.
import dashscope
import json
from http import HTTPStatus
# The input can be a video.
# video = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250107/lbcemt/new+video.mp4"
# input = [{'video': video}]
# Or an image.
image = "https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png"
input = [{'image': image}]
resp = dashscope.MultiModalEmbedding.call(
model="multimodal-embedding-v1",
input=input
)
print(json.dumps(resp.output, indent=4))
Model selection
The appropriate model depends on your input data type and application scenario.
Plain text or code: Use
text-embedding-v4
. This is the highest-performing model currently available and includes advanced features such as instruct and sparse vectors. This model is suitable for most text processing scenarios.Processing multimodal content: For content that contains a mix of images, text, or videos, you can choose the general-purpose multimodal model
multimodal-embedding-v1
.Large-scale data: To process large-scale, non-real-time text data, we recommend using
text-embedding-v4
through OpenAI-compatible batch calls to significantly reduce costs.
The following table lists the specifications for all available embedding models.
Text embedding
Singapore
Model | Embedding dimensions | Batch size | Maximum tokens per line (Note) | Price (Million input tokens) | Supported languages | Free quota (Note) |
text-embedding-v4 Part of the Qwen3-Embedding series | 2,048, 1,536, 1,024 (default), 768, 512, 256, 128, 64 | 10 | 8,192 | $0.07 | Over 100 major languages, including Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian | 1 million tokens Validity: 90 days after Model Studio activation |
text-embedding-v3 | 1,024 (default), 768, 512 | Over 50 major languages, including Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian | 500,000 tokens Validity: 90 days after Model Studio activation |
Beijing
Model | Embedding dimensions | Batch size | Maximum tokens per line | Price (Million input tokens) | Supported languages |
text-embedding-v4 Part of the Qwen3-Embedding series | 2,048, 1,536, 1,024 (default), 768, 512, 256, 128, 64 | 10 | 8,192 | $0.072 | Over 100 major languages, including Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian, and various programming languages |
The batch size is the maximum number of text inputs that can be processed in a single API call. For example, the batch size for text-embedding-v4 is 10. This means you can include up to 10 text inputs in a single embedding request. This limit applies to:
String array input: The array can contain up to 10 elements.
File input: The text file can contain up to 10 lines of text.
Multimodal embedding
The model generates continuous vectors from user inputs, such as text, images, or videos. This feature is suitable for scenarios such as video classification, image classification, image-text retrieval, and searching for images or videos using text or other images. Available only in the China (Beijing) region.
Model | Embedding dimensions | Text length limit | Image/video size limit | Price (1,000 input tokens) |
multimodal-embedding-v1 Available only in the China (Beijing) region | 1,024 | 512 tokens | Image size: ≤ 3 MB, Quantity: 1 | Free trial, with no token quota limit. |
The following input language and format limits apply to the general-purpose multimodal embedding API:
Input type | Language/Format limit |
Text | Chinese/English |
Image | JPG, PNG, and BMP. Supports input in Base64 format or as a URL. |
Multiple images | |
Video | MP4, MPEG, MPG, WEBM, AVI, FLV, MKV, MOV |
The API supports uploading a single text segment, image, or video file. You can also combine different types, such as text and an image. However, only one combination is allowed per call. Each content type can be included a maximum of once in the request, and the file must meet the specified length or size requirements.
Core features
Switch vector dimensions
text-embedding-v4
and text-embedding-v3
support custom vector dimensions. Higher dimensions retain richer semantic information, but they also increase storage and computation costs.
General scenarios (Recommended): A dimension of 1024 offers the best balance between performance and cost. It is suitable for most semantic retrieval tasks.
High-precision scenarios: For fields that require high precision, you can choose a dimension of 1536 or 2048. This provides a slight precision improvement but significantly increases storage and computation overhead.
Resource-constrained scenarios: In highly cost-sensitive scenarios, you can choose a dimension of 768 or lower. This significantly reduces resource consumption but results in some loss of semantic information.
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with https://dashscope.aliyuncs.com/compatible-mode/v1.
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
resp = client.embeddings.create(
model="text-embedding-v4",
input=["I like it and will come back to shop here again."],
# Set the vector dimension to 256.
dimensions=256
)
print(f"Vector dimension: {len(resp.data[0].embedding)}")
DashScope
import dashscope
# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
resp = dashscope.TextEmbedding.call(
model="text-embedding-v4",
input=["I like it and will come back to shop here again."],
# Set the vector dimension to 256.
dimension=256
)
print(f"Vector dimension: {len(resp.output['embeddings'][0]['embedding'])}")
query and document (text_type)
Only the DashScope SDK and API support this parameter.
For optimal search performance, differentiate between queries and documents based on the task using the text_type
parameter:
text_type: 'query'
: For user-input query text. The model generates a "title-like" vector that is more directional and is optimized for asking questions and searching.text_type: 'document'
(default): For document text stored in a database. The model generates a "body-like" vector that contains more comprehensive information and is optimized for matching.
Distinguish between query
and document
when matching short text against long text. In tasks such as clustering or classification where all text inputs have the same role, you do not need to set this parameter.
Improve performance with instruct
Only the DashScope SDK and API support this parameter.
By providing a clear English instruction (instruct), you can guide the text-embedding-v4
model to optimize vector quality for specific retrieval scenarios, which improves accuracy. When using this feature, you must set the text_type
parameter to query
.
# Scenario: When building document vectors for a search engine, add an instruction to optimize the vector quality for retrieval.
resp = dashscope.TextEmbedding.call(
model="text-embedding-v4",
input="Research papers on machine learning",
text_type="query",
instruct="Given a research paper query, retrieve relevant research paper"
)
Dense and sparse vectors
Only the DashScope SDK and API support this parameter.
text-embedding-v4
and text-embedding-v3
can output three types of vectors to suit different retrieval strategies.
Vector type (output_type) | Core advantage | Main disadvantage | Typical application scenario |
dense | Deep semantic understanding. It can identify synonyms and context, making retrieval results more relevant. | Higher computation and storage costs. It cannot guarantee an exact keyword match. | Semantic search, AI chat, content recommendation. |
sparse | High computational efficiency. It focuses on an exact match of keywords and enables fast filtering. | Lacks semantic understanding. It cannot handle synonyms or context. | Log retrieval, product SKU search, precise information filtering. |
dense&sparse | Combines semantics and keywords for optimal search results. The generation cost is the same as the former two types, with no additional API call overhead. | Large storage requirements. It requires a more complex system architecture and retrieval logic. | High-quality, production-grade hybrid search engines. |
Use cases
The following code is for demonstration purposes only. In a production environment, you should compute vectors in advance and store them in a vector store. During retrieval, you only need to compute the vector for the query.
Semantic search
You can achieve precise semantic matching by calculating the vector similarity between the query and documents.
import dashscope
import numpy as np
from dashscope import TextEmbedding
# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
def cosine_similarity(a, b):
"""Calculate cosine similarity"""
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def semantic_search(query, documents, top_k=5):
"""Semantic search"""
# Generate query vector
query_resp = TextEmbedding.call(
model="text-embedding-v4",
input=query,
dimension=1024
)
query_embedding = query_resp.output['embeddings'][0]['embedding']
# Generate document vectors
doc_resp = TextEmbedding.call(
model="text-embedding-v4",
input=documents,
dimension=1024
)
# Calculate similarity
similarities = []
for i, doc_emb in enumerate(doc_resp.output['embeddings']):
similarity = cosine_similarity(query_embedding, doc_emb['embedding'])
similarities.append((i, similarity))
# Sort and return top_k results
similarities.sort(key=lambda x: x[1], reverse=True)
return [(documents[i], sim) for i, sim in similarities[:top_k]]
# Example usage
documents = [
"Artificial intelligence is a branch of computer science",
"Machine learning is an important method for achieving artificial intelligence",
"Deep learning is a subfield of machine learning"
]
query = "What is AI?"
results = semantic_search(query, documents, top_k=2)
for doc, sim in results:
print(f"Similarity: {sim:.3f}, Document: {doc}")
Recommendation system
You can analyze user history vectors to discover user preferences and recommend similar items.
import dashscope
import numpy as np
from dashscope import TextEmbedding
# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
def cosine_similarity(a, b):
"""Calculate cosine similarity"""
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def build_recommendation_system(user_history, all_items, top_k=10):
"""Build a recommendation system"""
# Generate user history vectors
history_resp = TextEmbedding.call(
model="text-embedding-v4",
input=user_history,
dimension=1024
)
# Calculate user preference vector (by averaging)
user_embedding = np.mean([
emb['embedding'] for emb in history_resp.output['embeddings']
], axis=0)
# Generate all item vectors
items_resp = TextEmbedding.call(
model="text-embedding-v4",
input=all_items,
dimension=1024
)
# Calculate recommendation scores
recommendations = []
for i, item_emb in enumerate(items_resp.output['embeddings']):
score = cosine_similarity(user_embedding, item_emb['embedding'])
recommendations.append((all_items[i], score))
# Sort and return recommendation results
recommendations.sort(key=lambda x: x[1], reverse=True)
return recommendations[:top_k]
# Example usage
user_history = ["Science Fiction", "Action", "Suspense"]
all_movies = ["Future World", "Space Adventure", "Ancient Warfare", "Romantic Journey", "Superhero"]
recommendations = build_recommendation_system(user_history, all_movies)
for movie, score in recommendations:
print(f"Recommendation Score: {score:.3f}, Movie: {movie}")
Text clustering
You can automatically group similar texts by analyzing the distances between their vectors.
# You need to install scikit-learn: pip install scikit-learn
import dashscope
import numpy as np
from sklearn.cluster import KMeans
# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
def cluster_texts(texts, n_clusters=2):
"""Cluster a set of texts"""
# 1. Get the vectors for all texts.
resp = dashscope.TextEmbedding.call(
model="text-embedding-v4",
input=texts,
dimension=1024
)
embeddings = np.array([item['embedding'] for item in resp.output['embeddings']])
# 2. Use the KMeans algorithm for clustering.
kmeans = KMeans(n_clusters=n_clusters, random_state=0, n_init='auto').fit(embeddings)
# 3. Organize and return the results.
clusters = {i: [] for i in range(n_clusters)}
for i, label in enumerate(kmeans.labels_):
clusters[label].append(texts[i])
return clusters
# Example usage
documents_to_cluster = [
"Phone company A releases a new phone",
"Search engine company B launches a new system",
"World Cup final: Argentina vs. France",
"Chinese team wins another gold at the Olympics",
"A company releases its latest AI chip",
"European Championship match report"
]
clusters = cluster_texts(documents_to_cluster, n_clusters=2)
for cluster_id, docs in clusters.items():
print(f"--- Category {cluster_id} ---")
for doc in docs:
print(f"- {doc}")
Text classification
By calculating the vector similarity between an input text and predefined labels, you can identify and classify new categories without pre-labeled examples.
import dashscope
import numpy as np
# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
def cosine_similarity(a, b):
"""Calculate cosine similarity"""
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def classify_text_zero_shot(text, labels):
"""Zero-shot text classification"""
# 1. Get the vectors for the input text and all labels.
resp = dashscope.TextEmbedding.call(
model="text-embedding-v4",
input=[text] + labels,
dimension=1024
)
embeddings = resp.output['embeddings']
text_embedding = embeddings[0]['embedding']
label_embeddings = [emb['embedding'] for emb in embeddings[1:]]
# 2. Calculate the similarity with each label.
scores = [cosine_similarity(text_embedding, label_emb) for label_emb in label_embeddings]
# 3. Return the label with the highest similarity.
best_match_index = np.argmax(scores)
return labels[best_match_index], scores[best_match_index]
# Example usage
text_to_classify = "The fabric of this dress is comfortable, and the style is nice"
possible_labels = ["Digital Products", "Apparel & Accessories", "Food & Beverage", "Home & Living"]
label, score = classify_text_zero_shot(text_to_classify, possible_labels)
print(f"Input text: '{text_to_classify}'")
print(f"The best matching category is: '{label}' (Similarity: {score:.3f})")
Anomaly detection
You can identify anomalous data that significantly differs from normal patterns by calculating the vector similarity between a text vector and the center of normal sample vectors.
import dashscope
import numpy as np
def cosine_similarity(a, b):
"""Calculate cosine similarity"""
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def detect_anomaly(new_comment, normal_comments, threshold=0.6):
# 1. Vectorize all normal comments and the new comment.
all_texts = normal_comments + [new_comment]
resp = dashscope.TextEmbedding.call(
model="text-embedding-v4",
input=all_texts,
dimension=1024
)
embeddings = [item['embedding'] for item in resp.output['embeddings']]
# 2. Calculate the center vector (average) of the normal comments.
normal_embeddings = np.array(embeddings[:-1])
normal_center_vector = np.mean(normal_embeddings, axis=0)
# 3. Calculate the similarity between the new comment and the center vector.
new_comment_embedding = np.array(embeddings[-1])
similarity = cosine_similarity(new_comment_embedding, normal_center_vector)
# 4. Determine if it is an anomaly.
is_anomaly = similarity < threshold
return is_anomaly, similarity
# Example usage
normal_user_comments = [
"Today's meeting was very productive",
"The project is progressing smoothly",
"New version to be released next week",
"Good user feedback"
]
test_comments = {
"Normal comment": "The feature works as expected",
"Anomaly - meaningless garbled text": "asdfghjkl zxcvbnm"
}
print("--- Anomaly Detection Example ---")
for desc, comment in test_comments.items():
is_anomaly, score = detect_anomaly(comment, normal_user_comments)
result = "Yes" if is_anomaly else "No"
print(f"Comment: '{comment}'")
print(f"Is it an anomaly: {result} (Similarity to normal samples: {score:.3f})\n")
API reference
General-purpose text embedding
Multimodal embedding
Error codes
If a call fails, see Error messages for troubleshooting.
Rate limits
See Rate limits.
Model performance (MTEB/CMTEB)
Evaluation benchmarks
MTEB: Massive Text Embedding Benchmark. This benchmark evaluates model performance across various tasks, such as classification, clustering, and retrieval.
CMTEB: Chinese Massive Text Embedding Benchmark. This benchmark is designed specifically for evaluating Chinese text.
Scores range from 0 to 100, where a higher value indicates better performance.
Model | MTEB | MTEB (Retrieval task) | CMTEB | CMTEB (Retrieval task) |
text-embedding-v3 (512 dimensions) | 62.11 | 54.30 | 66.81 | 71.88 |
text-embedding-v3 (768 dimensions) | 62.43 | 54.74 | 67.90 | 72.29 |
text-embedding-v3 (1024 dimensions) | 63.39 | 55.41 | 68.92 | 73.23 |
text-embedding-v4 (512 dimensions) | 64.73 | 56.34 | 68.79 | 73.33 |
text-embedding-v4 (1024 dimensions) | 68.36 | 59.30 | 70.14 | 73.98 |
text-embedding-v4 (2048 dimensions) | 71.58 | 61.97 | 71.99 | 75.01 |