All Products
Search
Document Center

Alibaba Cloud Model Studio:Models and pricing

Last Updated:Oct 03, 2025

Flagship models (Singapore region)

Flagship model

通义new Qwen-Max

Ideal for complex tasks, most powerful.

通义new Qwen-Plus

Balanced performance, speed, and cost.

通义new Qwen-Flash

Ideal for simple tasks, fast and low-cost.

通义new Qwen-Coder

Excellent code model, excels at tool calling and environment interaction.

Maximum context window

(Tokens)

262,144

1,000,000

1,000,000

1,000,000

Minimum input price

(Million tokens)

$1.6

$0.4

$0.05

$0.3

Minimum output price

(Million tokens)

$6.4

$1.2

$0.4

$1.5

Model overview

Category

Subcategory

Description

Text generation

General-purpose LLMs

Qwen large language models: Commercial models (Qwen-Max, Qwen-Plus, Qwen-Flash), open-source models (Qwen3, Qwen2.5)

Multimodal models

Visual understanding model Qwen-VL, visual reasoning model QVQ, omni-modal model Qwen-Omni, and real-time multi-modal model Qwen-Omni-Realtime

Domain-specific models

Coder model, Translation model, Role-playing model

Image generation

Text-to-image

Image editing

Qwen-Image-Edit: Supports Chinese and English prompts and performs complex image and text editing operations, such as style transfer, text modification, and object editing.

Video generation

Text-to-video

Generates videos from a single sentence, offering rich styles and fine image quality.

Image-to-video

  • First-frame-to-video: Uses an input image as the first frame and generates a video based on a prompt.

  • First-and-last-frame-to-video: Generates a smooth and dynamic video based on the provided first and last frames and a prompt.

  • Multi-image-to-video: Generates a video by referencing the entity or background in one or more input images, combined with a prompt.

Video editing

General-purpose video editing: Performs various video editing tasks based on input text, images, and videos. For example, it can generate a new video by extracting motion features from an input video and combining them with a prompt.

Embedding

Text embedding

Converts text into a set of numbers that can represent the text, suitable for search, clustering, recommendation, and classification tasks.

Text generation - Qwen

The following are the Qwen commercial models. Compared to the open-source versions, the commercial models offer the latest capabilities and improvements.

The parameter sizes of the commercial models are not disclosed.
Each model is updated and upgraded periodically. To use a fixed version, you can select a snapshot. A snapshot is typically maintained for one month after the release of the next snapshot.
You can use the stable or latest version for more lenient rate limiting conditions.

Qwen-Max

This is the best-performing model in the Qwen series. It is suitable for complex, multi-step tasks. Usage | API reference | Try it online

The Qwen-Max model does not support deep thinking.

Qwen3-Max

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

Free quota

(Note)

(Tokens)

(Million tokens)

qwen3-max

Currently same capabilties as qwen3-max-2025-09-23

Stable

262,144

258,048

65,536

Tiered pricing, see the description below the table.

1 million tokens

Valid for 90 days after activating Alibaba Cloud Model Studio

qwen3-max-2025-09-23

Snapshot

qwen3-max-preview

Preview

Qwen3-Max uses tiered pricing based on the number of input tokens (left-open, right-closed intervals).

Input tokens

Input price (Million tokens)

qwen3-max and qwen3-max-preview support context cache.

Output price (Million tokens)

0–32K

$1.2

$6

32K–128K

$2.4

$12

128K–252K

$3

$15

Qwen-Max

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

Free quota

(Note)

(Tokens)

(Million tokens)

qwen-max

Provides the same capabilities as qwen-max-2025-01-25.

Stable

32,768

30,720

8,192

$1.6

50% discount for batch calls

$6.4

50% discount for batch calls

1 million tokens for input and 1 million for output

Valid for 90 days after you activate Alibaba Cloud Model Studio.

qwen-max-latest

Corresponds to the latest snapshot.

Latest

$1.6

$6.4

qwen-max-2025-01-25

Also known as qwen-max-0125, Qwen2.5-Max

Snapshot

Qwen-Plus

This model provides a balance of capabilities. Its inference performance, cost, and speed fall between Qwen-Max and Qwen-Flash, which makes it ideal for moderately complex tasks. Usage | API reference | Try it online | Deep thinking

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

Free quota

(Note)

(Tokens)

(Million tokens)

qwen-plus

Has the same capabilities as qwen-plus-2025-07-28.
Part of the Qwen3 series.

Stable

1,000,000

Thinking mode

995,904

Non-thinking mode

997,952

The default values are 262,144. You can adjust this value using the max_input_tokens parameter.

32,768

Maximum chain-of-thought: 81,920

Tiered pricing, see the description below the table.

1 million tokens

Valid for 90 days after you activate Alibaba Cloud Model Studio.

qwen-plus-latest

Has the same capabilities as qwen-plus-2025-07-28.
Part of the Qwen3 series.

Latest

Thinking mode

995,904

Non-thinking mode

997,952

qwen-plus-2025-09-11

Part of the Qwen3 series.

Snapshot

Thinking mode

995,904

Non-thinking mode

997,952

qwen-plus-2025-07-28

Also known as qwen-plus-0728.
Part of the Qwen3 series.

qwen-plus-2025-07-14

Also known as qwen-plus-0714.
Part of the Qwen3 series.

131,072

Thinking mode

98,304

Non-thinking mode

129,024

16,384

Maximum chain-of-thought: 38,912

$0.4

Thinking mode

$4

Non-thinking mode

$1.2

qwen-plus-2025-04-28

Also known as qwen-plus-0428.
Part of the Qwen3 series.

qwen-plus-2025-01-25

Also known as qwen-plus-0125.

129,024

8,192

$1.2

The qwen-plus, qwen-plus-latest, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 models use tiered pricing based on the number of input tokens per request (left-open, right-closed intervals).

Input tokens

Input price (Million tokens)

Mode

Output price (Million tokens)

0–256K

$0.4

Non-thinking mode

$1.2

Thinking mode

$4

256K–1M

$1.2

Non-thinking mode

$3.6

Thinking mode

$12

The qwen-plus-2025-09-11, qwen-plus-2025-07-28, qwen-plus-2025-07-14, qwen-plus-2025-04-28, qwen-plus-latest, and qwen-plus models support both thinking and non-thinking modes. You can switch between these modes using the enable_thinking parameter. In addition, the capabilities of these models have been significantly improved:

  1. Reasoning capability: In evaluations for math, code, and logical reasoning, it significantly outperforms QwQ and non-reasoning models of a similar size, which achieves top-tier performance in the industry for its scale.

  2. Human preference alignment: It features greatly enhanced capabilities in creative writing, role assumption, multi-turn conversation, and instruction following. Its general capabilities significantly exceed those of models of a similar size.

  3. Agent capability: It achieves industry-leading performance in both thinking and non-thinking modes and can accurately invoke external tools.

  4. Multilingual capability: It supports over 100 languages and dialects, with significantly improved capabilities in multilingual translation, instruction understanding, and common-sense reasoning.

    Supported languages

    English

    Simplified Chinese

    Traditional Chinese

    French

    Spanish

    Arabic is written in the Arabic script and is the official language in many Arab countries.

    Russian is written in the Cyrillic script and is the official language of Russia and several other countries.

    Portuguese is written in the Latin script and is the official language of Portugal, Brazil, and other Portuguese-speaking countries.

    German is written in the Latin script and is the official language of countries such as Germany and Austria.

    Italian is written in the Latin script and is the official language of Italy, San Marino, and parts of Switzerland.

    Dutch is written in the Latin script and is the official language of the Netherlands, parts of Belgium (Flanders), and Suriname.

    Danish is written in the Latin script and is the official language of Denmark.

    Irish is written in the Latin script and is one of the official languages of Ireland.

    Welsh is written in the Latin script and is an official language of Wales.

    Finnish is written in the Latin script and is the official language of Finland.

    Icelandic is written in the Latin script and is the official language of Iceland.

    Swedish is written in the Latin script and is the official language of Sweden.

    Norwegian Nynorsk is written in the Latin script and is one of two official written standards for the Norwegian language.

    Norwegian Bokmål is written in the Latin script and is the more common of the two official written standards for the Norwegian language.

    Japanese is written in Japanese script and is the official language of Japan.

    Korean is written in Hangul and is the official language of South Korea and North Korea.

    Vietnamese is written in the Latin script and is the official language of Vietnam.

    Thai is written in the Thai script and is the official language of Thailand.

    Indonesian is written in the Latin script and is the official language of Indonesia.

    Malay is written in the Latin script and is a major language in countries such as Malaysia.

    Burmese is written in the Burmese script and is the official language of Myanmar.

    Tagalog is written in the Latin script and is one of the major languages of the Philippines.

    Khmer is written in the Khmer script and is the official language of Cambodia.

    Lao is written in the Lao script and is the official language of Laos.

    Hindi is written in the Devanagari script and is one of the official languages of India.

    Bengali is written in the Bengali script and is the official language of Bangladesh and the Indian state of West Bengal.

    Urdu is written in the Arabic script. It is an official language of Pakistan and is also widely spoken in India.

    Nepali is written in the Devanagari script and is the official language of Nepal.

    Hebrew is written in the Hebrew script and is the official language of Israel.

    Turkish is written in the Latin script and is the official language of Türkiye and Northern Cyprus.

    Persian is written in the Arabic script and is the official language of countries such as Iran and Tajikistan.

    Polish is written in the Latin script and is the official language of Poland.

    Ukrainian is written in the Cyrillic script and is the official language of Ukraine.

    Czech is written in the Latin script and is the official language of the Czech Republic.

    Romanian is written in the Latin script and is the official language of Romania and Moldova.

    Bulgarian is written in the Cyrillic script and is the official language of Bulgaria.

    Slovak is written in the Latin script and is the official language of Slovakia.

    Hungarian is written in the Latin script and is the official language of Hungary.

    Slovenian is written in the Latin script and is the official language of Slovenia.

    Latvian is written in the Latin script and is the official language of Latvia.

    Estonian is written in the Latin script and is the official language of Estonia.

    Lithuanian is written in the Latin script and is the official language of Lithuania.

    Belarusian is written in the Cyrillic script and is one of the official languages of Belarus.

    Greek is written in the Greek script and is the official language of Greece and Cyprus.

    Croatian is written in the Latin script and is the official language of Croatia.

    Macedonian is written in the Cyrillic script and is the official language of North Macedonia.

    Maltese is written in the Latin script and is the official language of Malta.

    Serbian is written in the Cyrillic script and is the official language of Serbia.

    Bosnian is written in the Latin script and is one of the official languages of Bosnia and Herzegovina.

    Georgian is written in the Georgian script and is the official language of Georgia.

    Armenian is written in the Armenian script and is the official language of Armenia.

    North Azerbaijani is written in the Latin script and is the official language of Azerbaijan.

    Kazakh is written in the Cyrillic script and is the official language of Kazakhstan.

    Northern Uzbek is written in the Latin script and is the official language of Uzbekistan.

    Tajik is written in the Cyrillic script and is the official language of Tajikistan.

    Swahili is written in the Latin script and is a lingua franca or an official language in many East African countries.

    Afrikaans is written in the Latin script and is mainly spoken in South Africa and Namibia.

    Cantonese is written in Traditional Chinese characters and is a primary language in Guangdong Province, Hong Kong, and Macao.

    Luxembourgish is written in the Latin script. It is an official language of Luxembourg and is also spoken in parts of Germany.

    Limburgish is written in the Latin script and is mainly spoken in the Netherlands, Belgium, and parts of Germany.

    Catalan is written in the Latin script and is spoken in Catalonia and other parts of Spain.

    Galician is written in the Latin script and is mainly spoken in the Galicia region of Spain.

    Asturian is written in the Latin script and is mainly spoken in the Asturias region of Spain.

    Basque is written in the Latin script. It is mainly spoken in the Basque Country of Spain and France and is an official language of the Basque Autonomous Community in Spain.

    Occitan is written in the Latin script and is mainly spoken in the southern regions of France.

    Venetian is written in the Latin script and is mainly spoken in the Veneto region of Italy.

    Sardinian is written in the Latin script and is mainly spoken on the island of Sardinia in Italy.

    Sicilian is written in the Latin script and is mainly spoken on the island of Sicily in Italy.

    Friulian is written in the Latin script and is mainly spoken in the Friuli-Venezia Giulia region of Italy.

    Lombard is written in the Latin script and is mainly spoken in the Lombardy region of Italy.

    Ligurian is written in the Latin script and is mainly spoken in the Liguria region of Italy.

    Faroese is written in the Latin script. It is mainly spoken in the Faroe Islands and is one of their official languages.

    Tosk Albanian is written in the Latin script and is the southern dialect of Albanian.

    Silesian is written in the Latin script and is mainly spoken in Poland.

    Bashkir is written in the Cyrillic script and is mainly spoken in the Republic of Bashkortostan, Russia.

    Tatar is written in the Cyrillic script and is mainly spoken in the Republic of Tatarstan, Russia.

    Mesopotamian Arabic is written in the Arabic script and is mainly spoken in Iraq.

    Najdi Arabic is written in the Arabic script and is mainly spoken in the Najd region of Saudi Arabia.

    Egyptian Arabic is written in the Arabic script and is mainly spoken in Egypt.

    Levantine Arabic is written in the Arabic script and is mainly spoken in Syria and Lebanon.

    Ta'izzi-Adeni Arabic is written in the Arabic script and is mainly spoken in Yemen and the Hadhramaut region of Saudi Arabia.

    Dari is written in the Arabic script and is one of the official languages of Afghanistan.

    Tunisian Arabic is written in the Arabic script and is mainly spoken in Tunisia.

    Moroccan Arabic is written in the Arabic script and is mainly spoken in Morocco.

    Kabuverdianu is written in the Latin script and is mainly spoken in Cape Verde.

    Tok Pisin is written in the Latin script and is a major lingua franca in Papua New Guinea.

    Eastern Yiddish is written in the Hebrew script and is mainly spoken in Jewish communities.

    Sindhi is written in the Arabic script and is an official language of the Sindh province in Pakistan.

    Sinhala is written in the Sinhala script and is one of the official languages of Sri Lanka.

    Telugu is written in the Telugu script and is an official language of the Indian states of Andhra Pradesh and Telangana.

    Punjabi is written in the Gurmukhi script. It is spoken in the Indian state of Punjab and is one of India's official languages.

    Tamil is written in the Tamil script and is an official language of the Indian state of Tamil Nadu and of Sri Lanka.

    Gujarati is written in the Gujarati script and is an official language of the Indian state of Gujarat.

    Malayalam is written in the Malayalam script and is the official language of the Indian state of Kerala.

    Marathi is written in the Devanagari script and is the official language of the Indian state of Maharashtra.

    Kannada is written in the Kannada script and is the official language of the Indian state of Karnataka.

    Magahi is written in the Devanagari script and is mainly spoken in the Indian state of Bihar.

    Odia is written in the Odia script and is the official language of the Indian state of Odisha.

    Awadhi is written in the Devanagari script and is mainly spoken in the Indian state of Uttar Pradesh.

    Maithili is written in the Devanagari script. It is spoken in the Indian state of Bihar and the Terai plains of Nepal and is one of India's official languages.

    Assamese is written in the Bengali script and is the official language of the Indian state of Assam.

    Chhattisgarhi is written in the Devanagari script and is mainly spoken in the Indian state of Chhattisgarh.

    Bhojpuri is written in the Devanagari script and is spoken in parts of India and Nepal.

    Minangkabau is written in the Latin script and is mainly spoken on the island of Sumatra in Indonesia.

    Balinese is written in the Latin script and is mainly spoken on the island of Bali in Indonesia.

    Javanese is written in the Latin script but also commonly in the Javanese script. It is widely spoken on the island of Java in Indonesia.

    Banjar is written in the Latin script and is mainly spoken on the island of Kalimantan in Indonesia.

    Sundanese is written in the Latin script but traditionally in the Sundanese script. It is mainly spoken in the western part of the island of Java in Indonesia.

    Cebuano is written in the Latin script and is mainly spoken in the Cebu region of the Philippines.

    Pangasinan is written in the Latin script and is mainly spoken in the Pangasinan province of the Philippines.

    Iloko is written in the Latin script and is mainly spoken in the Philippines.

    Waray is written in the Latin script and is mainly spoken in the Philippines.

    Haitian Creole is written in the Latin script and is one of the official languages of Haiti.

    Papiamento is written in the Latin script and is mainly spoken in Caribbean regions such as Aruba and Curaçao.

  5. Response format: This version fixes response format issues from previous versions, such as abnormal Markdown, intermediate truncation, and incorrect boxed output.

Qwen-Flash

Qwen-Flash is the fastest and most cost-effective model in the Qwen series and is suitable for simple jobs. It uses flexible tiered pricing. Usage | API reference | Try it online | Thinking mode

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

Free quota

(Note)

(Tokens)

(Million tokens)

qwen-flash

This model has the same capabilities as qwen-flash-2025-07-28.
Part of the Qwen3 series.

A 50% discount applies to batch calls.

Stable

1,000,000

Thinking mode

995,904

Non-thinking mode

997,952

32,768

Maximum chain-of-thought: 81,920.

Tiered pricing, see the description below this table.

1 million input and 1 million output tokens

Valid for 90 days after you activate Alibaba Cloud Model Studio.

qwen-flash-2025-07-28

Part of the Qwen3 series.

Snapshot

The qwen-flash and qwen-flash-2025-07-28 models use tiered pricing based on the number of input tokens in each request (left-open, right-closed intervals). The qwen-flash model supports caching and batch calling.

Input token count

Input price (Million tokens)

Output price (Million tokens)

0–256K

$0.05

$0.40

256K–1M

$0.25

$2.00

Qwen-Turbo

Qwen-Turbo is deprecated. We recommend Qwen-Flash instead. Qwen-Flash offers flexible tiered pricing for more cost-effective billing. Usage | API reference | Try it online | Deep thinking

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

Free quota

Note

(Tokens)

(Million tokens)

qwen-turbo

Provides the same capabilities as qwen-turbo-2025-04-28.
Part of the Qwen3 series

Stable

Thinking mode

131,072

Non-thinking mode

1,000,000

Thinking mode

98,304

Non-thinking mode

1,000,000

16,384

Maximum chain-of-thought: 38,912

$0.05

Half price for batch calling

Thinking mode: $0.5

Non-thinking mode: $0.2

Half price for batch calling

1 million tokens each

Valid for 90 days after you activate Alibaba Cloud Model Studio.

qwen-turbo-latest

Provides the same capabilities as the latest snapshot.
Part of the Qwen3 series

Latest

$0.05

Thinking mode: $0.5

Non-thinking mode: $0.2

qwen-turbo-2025-04-28

Also known as qwen-turbo-0428
Part of the Qwen3 series

Snapshot

qwen-turbo-2024-11-01

Also known as qwen-turbo-1101

1,000,000

1,000,000

8,192

$0.2

The latest qwen-turbo-2025-04-28 and qwen-turbo-latest models have thinking and non-thinking mode response capabilities. You can switch between the two modes using the enable_thinking parameter. In addition, the model's capabilities have been significantly improved:

  1. Reasoning capability: In evaluations for math, code, and logical reasoning, it significantly outperforms QwQ and non-reasoning models of a similar size, which reaches the top tier in the industry for its scale.

  2. Human preference alignment: Capabilities in creative writing, role assumption, multi-turn conversation, and instruction following are greatly enhanced. Its general capabilities significantly exceed those of models of a similar size.

  3. Agent capability: This model reaches industry-leading levels in both reasoning and non-reasoning modes. It can achieve precise external tool invocation.

  4. Multilingual capability: This model supports over 100 languages and dialects. Capabilities in multilingual translation, instruction understanding, and common-sense reasoning are significantly improved.

    Supported languages

    English

    Simplified Chinese

    Traditional Chinese

    French

    Spanish

    Arabic, written in the Arabic script, is the official language of many Arab countries.

    Russian, written in the Cyrillic script, is the official language of Russia and some other countries.

    Portuguese, written in the Latin script, is the official language of Portugal, Brazil, and other Portuguese-speaking countries.

    German, written in the Latin script, is the official language of countries such as Germany and Austria.

    Italian, written in the Latin script, is the official language of Italy, San Marino, and parts of Switzerland.

    Dutch, written in the Latin script, is the official language of the Netherlands, parts of Belgium (Flanders), and Suriname.

    Danish, written in the Latin script, is the official language of Denmark.

    Irish, written in the Latin script, is one of the official languages of Ireland.

    Welsh, written in the Latin script, is one of the official languages of Wales.

    Finnish, written in the Latin script, is the official language of Finland.

    Icelandic, written in the Latin script, is the official language of Iceland.

    Swedish, written in the Latin script, is the official language of Sweden.

    Norwegian Nynorsk, written in the Latin script, is an official written standard for the Norwegian language, used alongside Norwegian Bokmål.

    Norwegian Bokmål, written in the Latin script, is the most common written standard for the Norwegian language.

    Japanese, written in the Japanese script, is the official language of Japan.

    Korean, written in the Hangul script, is the official language of South Korea and North Korea.

    Vietnamese, written in the Latin script, is the official language of Vietnam.

    Thai, written in the Thai script, is the official language of Thailand.

    Indonesian, written in the Latin script, is the official language of Indonesia.

    Malay, written in the Latin script, is a major language in countries such as Malaysia.

    Burmese, written in the Burmese script, is the official language of Myanmar.

    Tagalog, written in the Latin script, is one of the major languages of the Philippines.

    Khmer, written in the Khmer script, is the official language of Cambodia.

    Lao, written in the Lao script, is the official language of Laos.

    Hindi, written in the Devanagari script, is one of the official languages of India.

    Bengali, written in the Bengali script, is the official language of Bangladesh and the Indian state of West Bengal.

    Urdu, written in the Arabic script, is an official language of Pakistan and is also spoken in India.

    Nepali, written in the Devanagari script, is the official language of Nepal.

    Hebrew, written in the Hebrew script, is the official language of Israel.

    Turkish, written in the Latin script, is the official language of Türkiye and Northern Cyprus.

    Persian, written in the Arabic script, is the official language of countries such as Iran and Tajikistan.

    Polish, written in the Latin script, is the official language of Poland.

    Ukrainian, written in the Cyrillic script, is the official language of Ukraine.

    Czech, written in the Latin script, is the official language of the Czech Republic.

    Romanian, written in the Latin script, is the official language of Romania and Moldova.

    Bulgarian, written in the Cyrillic script, is the official language of Bulgaria.

    Slovak, written in the Latin script, is the official language of Slovakia.

    Hungarian, written in the Latin script, is the official language of Hungary.

    Slovenian, written in the Latin script, is the official language of Slovenia.

    Latvian, written in the Latin script, is the official language of Latvia.

    Estonian, written in the Latin script, is the official language of Estonia.

    Lithuanian, written in the Latin script, is the official language of Lithuania.

    Belarusian, written in the Cyrillic script, is one of the official languages of Belarus.

    Greek, written in the Greek script, is the official language of Greece and Cyprus.

    Croatian, written in the Latin script, is the official language of Croatia.

    Macedonian, written in the Cyrillic script, is the official language of North Macedonia.

    Maltese, written in the Latin script, is the official language of Malta.

    Serbian, written in the Cyrillic script, is the official language of Serbia.

    Bosnian, written in the Latin script, is one of the official languages of Bosnia and Herzegovina.

    Georgian, written in the Georgian script, is the official language of Georgia.

    Armenian, written in the Armenian script, is the official language of Armenia.

    North Azerbaijani, written in the Latin script, is the official language of Azerbaijan.

    Kazakh, written in the Cyrillic script, is the official language of Kazakhstan.

    Northern Uzbek, written in the Latin script, is the official language of Uzbekistan.

    Tajik, written in the Cyrillic script, is the official language of Tajikistan.

    Swahili, written in the Latin script, is a lingua franca or official language in many East African countries.

    Afrikaans, written in the Latin script, is primarily spoken in South Africa and Namibia.

    Cantonese is written in Traditional Chinese characters and is a major language spoken in Guangdong Province, Hong Kong, and Macao.

    Luxembourgish, written in the Latin script, is one of the official languages of Luxembourg and is also spoken in parts of Germany.

    Limburgish, written in the Latin script, is primarily spoken in the Netherlands, Belgium, and parts of Germany.

    Catalan, written in the Latin script, is spoken in Catalonia and other parts of Spain.

    Galician, written in the Latin script, is primarily spoken in the Galicia region of Spain.

    Asturian, written in the Latin script, is primarily spoken in the Asturias region of Spain.

    Basque, written in the Latin script, is primarily spoken in the Basque Country of Spain and France. It is one of the official languages of the Basque Autonomous Community in Spain.

    Occitan, written in the Latin script, is primarily spoken in the southern regions of France.

    Venetian, written in the Latin script, is primarily spoken in the Veneto region of Italy.

    Sardinian, written in the Latin script, is primarily spoken on the island of Sardinia in Italy.

    Sicilian, written in the Latin script, is primarily spoken on the island of Sicily in Italy.

    Friulian, written in the Latin script, is primarily spoken in the Friuli-Venezia Giulia region of Italy.

    Lombard, written in the Latin script, is primarily spoken in the Lombardy region of Italy.

    Ligurian, written in the Latin script, is primarily spoken in the Liguria region of Italy.

    Faroese, written in the Latin script, is one of the official languages of the Faroe Islands.

    Tosk Albanian, written in the Latin script, is the southern dialect of the Albanian language.

    Silesian, written in the Latin script, is primarily spoken in Poland.

    Bashkir, written in the Cyrillic script, is primarily spoken in Bashkortostan, Russia.

    Tatar, written in the Cyrillic script, is primarily spoken in Tatarstan, Russia.

    Mesopotamian Arabic, written in the Arabic script, is primarily spoken in Iraq.

    Najdi Arabic, written in the Arabic script, is primarily spoken in the Najd region of Saudi Arabia.

    Egyptian Arabic, written in the Arabic script, is primarily spoken in Egypt.

    Levantine Arabic, written in the Arabic script, is primarily spoken in Syria and Lebanon.

    Ta'izzi-Adeni Arabic, written in the Arabic script, is primarily spoken in Yemen and the Hadhramaut region of Saudi Arabia.

    Dari, written in the Arabic script, is one of the official languages of Afghanistan.

    Tunisian Arabic, written in the Arabic script, is primarily spoken in Tunisia.

    Moroccan Arabic, written in the Arabic script, is primarily spoken in Morocco.

    Kabuverdianu, written in the Latin script, is primarily spoken in Cape Verde.

    Tok Pisin, written in the Latin script, is a major lingua franca of Papua New Guinea.

    Eastern Yiddish, written in the Hebrew script, is primarily spoken in Jewish communities.

    Sindhi, written in the Arabic script, is one of the official languages of the Sindh province in Pakistan.

    Sinhala, written in the Sinhala script, is one of the official languages of Sri Lanka.

    Telugu, written in the Telugu script, is one of the official languages of the Indian states of Andhra Pradesh and Telangana.

    Punjabi, written in the Gurmukhi script, is one of India's official languages and is spoken in the state of Punjab.

    Tamil, written in the Tamil script, is an official language of the Indian state of Tamil Nadu and of Sri Lanka.

    Gujarati, written in the Gujarati script, is one of the official languages of the Indian state of Gujarat.

    Malayalam, written in the Malayalam script, is one of the official languages of the Indian state of Kerala.

    Marathi, written in the Devanagari script, is one of the official languages of the Indian state of Maharashtra.

    Kannada, written in the Kannada script, is one of the official languages of the Indian state of Karnataka.

    Magahi, written in the Devanagari script, is primarily spoken in the Indian state of Bihar.

    Oriya, written in the Odia script, is one of the official languages of the Indian state of Odisha.

    Awadhi, written in the Devanagari script, is primarily spoken in the Indian state of Uttar Pradesh.

    Maithili, written in the Devanagari script, is one of India's official languages and is spoken in the Indian state of Bihar and the Terai plains of Nepal.

    Assamese, written in the Bengali script, is one of the official languages of the Indian state of Assam.

    Chhattisgarhi, written in the Devanagari script, is primarily spoken in the Indian state of Chhattisgarh.

    Bhojpuri, written in the Devanagari script, is spoken in parts of India and Nepal.

    Minangkabau, written in the Latin script, is primarily spoken on the island of Sumatra in Indonesia.

    Balinese, written in the Latin script, is primarily spoken on the island of Bali in Indonesia.

    Javanese, written in the Latin script, is widely spoken on the island of Java in Indonesia. The Javanese script is also commonly used.

    Banjar, written in the Latin script, is primarily spoken on the island of Kalimantan in Indonesia.

    Sundanese, written in the Latin script, is primarily spoken in the western part of Java, Indonesia. The Sundanese script was also traditionally used.

    Cebuano, written in the Latin script, is primarily spoken in the Cebu region of the Philippines.

    Pangasinan, written in the Latin script, is primarily spoken in the Pangasinan province of the Philippines.

    Iloko, written in the Latin script, is primarily spoken in the Philippines.

    Waray (Philippines), written in the Latin script, is primarily spoken in the Philippines.

    Haitian Creole, written in the Latin script, is one of the official languages of Haiti.

    Papiamento, written in the Latin script, is primarily spoken in Caribbean regions such as Aruba and Curaçao.

  5. Response format fixes: Fixes response format issues from previous versions, such as abnormal Markdown, intermediate truncation, and incorrect boxed output.

QwQ

QwQ is a reasoning model trained based on the Qwen2.5 model. Its reasoning capability has been significantly improved through reinforcement learning. The model's core metrics for math and code (AIME 24/25, LiveCodeBench) and some general metrics (IFEval, LiveBench) are on par with the full-power version of DeepSeek-R1. Usage

Model

Version

Context window

Maximum input

Maximum chain-of-thought

Maximum response

Input price

Output price

Free quota

(Note)

(Tokens)

(Million tokens)

qwq-plus

Stable

131,072

98,304

32,768

8,192

$0.8

$2.4

1 million tokens

Validity: 90 days after you activate Model Studio

Qwen-Omni

Qwen-Omni accepts multimodal inputs, such as text, images, audio, and video. It generates text or speech responses. The model provides a variety of expressive, human-like voices and supports speech output in multiple languages and dialects. It can be used in audio and video chat scenarios, such as visual recognition, emotion detection, education, and training. Usage | API reference

Qwen3-Omni-Flash

Model

Version

Mode

Context window

Maximum input

Maximum CoT

Maximum output

Free quota

(Note)

(Tokens)

qwen3-omni-flash

Currently same capability as qwen3-omni-flash-2025-09-15

Stable

Thinking

65,536

16,384

32,768

16,384

1 million tokens each (regardless of modality)

Valid for 90 days after activation

Non-thinking

49,152

-

qwen3-omni-flash-2025-09-15

Also qwen3-omni-flash-0915

Snapshot

Thinking

65,536

16,384

32,768

16,384

Non-thinking

49,152

-

After you use up your free quota, inputs and outputs are billed as follows. The billing is the same for both thinking and non-thinking modes. Audio output is not supported in thinking mode.

Input billing items

Unit price (Million tokens)

Input: Text

$0.43

Input: Audio

$3.81

Input: Image/Video

$0.78

Output billing items

Unit price (Million tokens)

Output: Text

$1.66 (when the input contains only text)

$3.96 (when the input contains images or audio)

Output: Text and audio

This item is not billed in thinking mode.

$15.11 (audio)

The output text is not billed.

Qwen-Omni-Turbo (based on Qwen2.5)

Model

Version

Context window

Maximum input

Maximum output

Free quota

(Note)

(Tokens)

qwen-omni-turbo

Currently has the same capabilities as qwen-omni-turbo-2025-03-26.

Stable

32,768

30,720

2,048

1 million tokens each (regardless of modality)

This quota is valid for 90 days after you activate Model Studio.

qwen-omni-turbo-latest

Always has the same capabilities as the latest snapshot version.

Latest

qwen-omni-turbo-2025-03-26

Also known as qwen-omni-turbo-0326.

Snapshot

After you use up the free quota for the commercial model, the billing rules for inputs and outputs are as follows:

Input billing item

Price (Million tokens)

Input: Text

$0.07

Input: Audio

$4.44

Input: Image/Video

$0.21

Output billing item

Price (Million tokens)

Output: Text

$0.27 (for text-only input)

$0.63 (for input containing images, audio, or video)

Output: Text + Audio

$8.89 (Audio)

The output text is not billed.

Qwen3-Omni-Flash is recommended. It offers significant improvements in capabilities compared to Qwen-Omni-Turbo, which is no longer updated:

  • It is a hybrid model that supports both thinking and non-thinking modes. You can switch between the two modes using the enable_thinking parameter. The thinking mode is disabled by default.

  • Audio output is not supported in thinking mode. In non-thinking mode, the model's audio output has the following features:

    • The number of supported voices is increased to 17. Qwen-Omni-Turbo supports only 4.

    • The number of supported languages is increased to 10. Qwen-Omni-Turbo supports only 2.

Qwen-Omni-Realtime

Unlike Qwen-Omni, Qwen-Omni-Realtime supports audio stream inputs. It has a built-in Voice Activity Detection (VAD) feature that automatically detects the start and end of user speech. UsageClient eventsSever events

Qwen3-Omni-Flash-Realtime

Model

Version

Context window

Maximum input

Maximum output

Free quota

(Note)

(Tokens)

qwen3-omni-flash-realtime

Current capabilities are equivalent to qwen3-omni-flash-realtime-2025-09-15

Stable

65,536

49,152

16,384

1 million tokens each (regardless of modality)

Valid for 90 days after you activate Model Studio.

qwen3-omni-flash-realtime-2025-09-15

Snapshot

After you use up the free quota, the billing rules for inputs and outputs are as follows:

Input billing item

Price (Million tokens)

Input: Text

$0.52

Input: Audio

$4.57

Input: Image/Video

$0.94

Output billing item

Price (Million tokens)

Output: Text

$1.99 (for text-only input)

$3.67 (for input containing images or audio)

Output: Text + Audio

$18.13 (for audio)

The output text is not billed.

Qwen-Omni-Turbo-Realtime (based on Qwen2.5)

Model

Version

Context window

Maximum input

Maximum output

Free quota

(Note)

(Tokens)

qwen-omni-turbo-realtime

Currently has the same capabilities as qwen-omni-turbo-realtime-2025-05-08.

Stable

32,768

30,720

2,048

1 million tokens each (regardless of modality)

Valid for 90 days after you activate Model Studio.

qwen-omni-turbo-realtime-latest

Always has the same capabilities the latest snapshot version.

Latest

qwen-omni-turbo-realtime-2025-05-08

Snapshot

After you use up the free quota, the billing rules for inputs and outputs are as follows:

Input billing item

Price (Million tokens)

Input: Text

$0.270

Input: Audio

$4.440

Input: Image

$0.840

Output billing item

Price (Million tokens)

Output: Text

$1.070 (for text-only input)

$2.520 (for input containing images or audio)

Output: Text + Audio

$8.890 (for audio)

The output text is not billed.

Qwen3-Omni-Flash-Realtime is recommended. It provides significant improvements over Qwen-Omni-Turbo-Realtime, which will no longer be updated. For audio output from the model:

  • Supports 17 voices, whereas Qwen-Omni-Turbo-Realtime supports only 4.

  • Supports 10 languages, whereas Qwen-Omni-Turbo-Realtime supports only 2.

QVQ

QVQ is a visual reasoning model that supports visual input and chain-of-thought output. It demonstrates enhanced capabilities in math, programming, visual analysis, creation, and general tasks. Usage

Model

Version

Context window

Maximum input

Maximum CoT

Maximum response

Input price

Output price

Free quota

(Note)

(Tokens)

(Million tokens)

qvq-max

Currently same performance as qvq-max-2025-03-25

Stable

131,072

106,496

Up to 16,384 per image

16,384

8,192

$1.2

$4.8

1 million tokens each

Valid for 180 days after activation

qvq-max-latest

Always same performance as the latest snapshot

Latest

qvq-max-2025-03-25

Also qvq-max-0325

Snapshot

Qwen-VL

Qwen-VL is a text generation model with visual (image) understanding capabilities. It comes in two series: QwenVL-Max and QwenVL-Plus. It can perform OCR and also summarize and reason. For example, it can extract properties from product photos or solve problems based on exercise diagrams. Usage | API reference | Try it online

Qwen-VL models are billed based on the total number of input and output tokens.
Image token calculation rule: Visual understanding.

Qwen3-VL-Plus

Model

Version

Mode

Context window

Maximum input

Maximum chain-of-thought

Maximum output

Input price

Output price

Chain-of-thought + output

Free quota

(Note)

(Tokens)

(Per 1,000 tokens)

qwen3-vl-plus

Currently has the same capabilities as qwen3-vl-plus-2025-09-23

Stable

Thinking

262,144

258,048

Max 16,384 per image

81,920

32,768

Tiered pricing. For more information, see the notes below the table.

1 million tokens for input and output each

Validity: 90 days after you activate Alibaba Cloud Model Studio

Non-thinking

262,144

260,096

Max 16,384 per image

-

qwen3-vl-plus-2025-09-23

Snapshot

Thinking

262,144

258,048

Max 16,384 per image

81,920

32,768

Non-thinking

262,144

260,096

Max 16,384 per image

-

The qwen3-vl-plus and qwen3-vl-plus-2025-09-23 models use a tiered billing method based on the number of input tokens in each request. The input and output prices are the same for both the thinking and non-thinking modes.

Number of input tokens

Input price (Million tokens)

Output price (Million tokens)

0 to 32K

$0.2

$1.6

32K to 128K

$0.3

$2.4

128K to 256K

$0.6

$4.8

QwenVL-Max

This is the most powerful model in the Qwen-VL series. The following models belong to the Qwen2.5-VL series.

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

Free quota

(Note)

(Tokens)

(Million tokens)

qwen-vl-max

Offers further improvements in visual reasoning and instruction following capabilities compared to qwen-vl-plus, delivering optimal performance on more complex tasks.
Currently has the same capabilities as qwen-vl-max-2025-08-13

Stable

131,072

129,024

Max 16,384 per image

8,192

$0.8

50% off for batch calls

$3.2

50% off for batch calls

1 million tokens for input and output each

Validity: 90 days after you activate Alibaba Cloud Model Studio

qwen-vl-max-latest

Always has the same capabilities as the latest snapshot

Latest

$0.8

$3.2

qwen-vl-max-2025-08-13

Also known as qwen-vl-max-0813
Features comprehensive improvements in visual understanding metrics, with significantly enhanced capabilities in mathematics, reasoning, object detection, and multilingual processing.

Snapshot

qwen-vl-max-2025-04-08

Also known as qwen-vl-max-0408
Belongs to the Qwen2.5-VL series. The context is extended to 128k, and the mathematics and reasoning capabilities are significantly enhanced.

QwenVL-Plus

The QwenVL-Plus model offers a balance between performance and cost. The following models belong to the Qwen2.5-VL series.

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

Free quota

(Note)

(Tokens)

(Million tokens)

qwen-vl-plus

Currently has the same capabilities as qwen-vl-plus-2025-08-15

Stable

131,072

129,024

Max 16,384 per image

8,192

$0.21

50% off for batch calls

$0.63

50% off for batch calls

1 million tokens for input and output each

Validity: 90 days after you activate Alibaba Cloud Model Studio

qwen-vl-plus-latest

Always has the same capabilities as the latest snapshot

Latest

$0.21

$0.63

qwen-vl-plus-2025-08-15

Also known as qwen-vl-plus-0815
Significantly improved capabilities in object detection and localization, and multilingual processing

Snapshot

qwen-vl-plus-2025-05-07

Also known as qwen-vl-plus-0507
Significantly improves the ability to understand mathematics, reasoning, and content from monitoring videos

qwen-vl-plus-2025-01-25

Also known as qwen-vl-plus-0125
Belongs to the Qwen2.5-VL series. The context is extended to 128k, and the image and video understanding capabilities are significantly enhanced.

Qwen-OCR

The Qwen-OCR model is specialized for text extraction. Compared to the Qwen-VL model, it focuses more on extracting text from images such as documents, forms, exam questions, and handwritten text. It can recognize multiple languages, including English, French, Japanese, Korean, German, Russian, and Italian. Usage | API reference | Try it online

Model

Version

Context window

Maximum input

Maximum output

Input and output unit price

Free quota

(Note)

(Tokens)

(Million tokens)

qwen-vl-ocr

Stable

34,096

30,000

A maximum of 30,000 tokens per image.

4,096

$0.72

1 million input tokens and 1 million output tokens

Valid for 90 days after you activate Alibaba Cloud Model Studio.

Qwen-ASR

Based on Qwen's multimodal model, Qwen-ASR supports multilingual recognition, singing recognition, and noise rejection. Usage

Model

Version

Supported languages

Supported sample rates

Unit price

Free quota (Note)

qwen3-asr-flash

Currently equivalent to qwen3-asr-flash-2025-09-08

Stable

Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish

16 kHz

$0.000035/second

36,000 seconds (10 hours)

Validity: 90 days after you activate Model Studio

qwen3-asr-flash-2025-09-08

Snapshot

Qwen-Coder

This is the Qwen code model. The latest Qwen3-Coder series models are code generation models based on Qwen3. They have powerful coding Agent capabilities, excel at tool calling and environment interaction, and can perform autonomous programming. They combine excellent coding skills with general-purpose capabilities. Usage | API reference

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

Free quota

(Note)

(Tokens)

(Million tokens)

qwen3-coder-plus

Currently has the same capabilities as qwen3-coder-plus-2025-07-22

Stable

1,000,000

997,952

65,536

Tiered pricing. See the description below the table.

1 million tokens each

Valid for 90 days after you activate Model Studio

qwen3-coder-plus-2025-09-23

Snapshot

qwen3-coder-plus-2025-07-22

Snapshot

qwen3-coder-flash

Currently has the same capabilities as qwen3-coder-flash-2025-07-28

Stable

qwen3-coder-flash-2025-07-28

Snapshot

The preceding models use a tiered billing method based on the number of input tokens in each request (left-open, right-closed intervals).

qwen3-coder-plus

The prices for qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are as follows. qwen3-coder-plus supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price. Input text that hits the explicit cache is billed at 10% of the unit price.

Input tokens

Input cost (Million tokens)

Output cost (Million tokens)

0–32K

$1

$5

32K–128K

$1.8

$9

128K–256K

$3

$15

256K–1M

$6

$60

qwen3-coder-flash series

The prices for qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are as follows. qwen3-coder-flash supports context cache. Input text that hits the implicit cache is billed at 20% of the unit price.

Input tokens

Input cost (Million tokens)

Output cost (Million tokens)

0–32K

$0.3

$1.5

32K–128K

$0.5

$2.5

128K–256K

$0.8

$4

256K–1M

$1.6

$9.6

Qwen-MT

This is a flagship large translation model fully upgraded based on Qwen 3. It supports mutual translation across 92 languages, including Chinese, English, Japanese, Korean, French, Spanish, German, Thai, Indonesian, Vietnamese, and Arabic. The model's performance and translation quality are comprehensively upgraded. It provides more stable term customization, format retention, and domain-specific prompt capabilities, which makes translations more accurate and natural. Usage

Model

Context window

Maximum input

Maximum output

Input price

Output price

Free quota

(Note)

(Tokens)

(Million tokens)

qwen-mt-plus

Qwen3-MT

16,384

8,192

8,192

$2.46

$7.37

1 million tokens per model

Valid for 90 days after activating Alibaba Cloud Model Studio

qwen-mt-turbo

Qwen3-MT

$0.16

$0.49

Text generation - Qwen open-source versions

  • In the model names, `xxb` indicates the parameter size. For example, `qwen2-72b-instruct` indicates a parameter size of 72 billion (72B).

  • Alibaba Cloud Model Studio supports calling the open-source versions of Qwen. You do not need to deploy the models locally. For open-source versions, we recommend using the Qwen3 and Qwen2.5 models.

Qwen3

The qwen3-next-80b-a3b-thinking model, released in September 2025, supports only thinking mode. It features improved instruction-following capabilities compared to the qwen3-235b-a22b-thinking-2507 model, resulting in more concise summary responses.

The qwen3-next-80b-a3b-instruct model, released in September 2025, supports only non-thinking mode. It offers enhanced Chinese understanding, logical reasoning, and text generation capabilities compared to qwen3-235b-a22b-instruct-2507.

The qwen3-235b-a22b-thinking-2507 and qwen3-30b-a3b-thinking-2507 models, released in July 2025 and supporting only the thinking mode, are upgrades to the thinking mode of the qwen3-235b-a22b and qwen3-30b-a3b models.

The qwen3-235b-a22b-instruct-2507 and qwen3-30b-a3b-instruct-2507 models, released in July 2025 and supporting only the non-thinking mode, are upgrades to the non-thinking mode of the qwen3-235b-a22b and qwen3-30b-a3b models.

The Qwen3 models released in April 2025 support thinking and non-thinking modes. You can switch between the two modes using the enable_thinking parameter. In addition, the capabilities of the Qwen3 models have been significantly improved:

  1. Reasoning capability: In evaluations for math, code, and logical reasoning, it significantly outperforms QwQ and non-reasoning models of a similar size, which reaches the top tier in the industry for its scale.

  2. Human preference alignment: Capabilities in creative writing, role assumption, multi-turn conversation, and instruction following are greatly enhanced. Its general capabilities significantly exceed those of models of a similar size.

  3. Agent capability: This model reaches industry-leading levels in both reasoning and non-reasoning modes. It can achieve precise external tool invocation.

  4. Multilingual capability: This model supports over 100 languages and dialects. Capabilities in multilingual translation, instruction understanding, and common-sense reasoning are significantly improved.

    Supported languages

    English

    Simplified Chinese

    Traditional Chinese

    French

    Spanish

    Arabic, written in the Arabic script, is the official language of many Arab countries.

    Russian, written in the Cyrillic script, is the official language of Russia and some other countries.

    Portuguese, written in the Latin script, is the official language of Portugal, Brazil, and other Portuguese-speaking countries.

    German, written in the Latin script, is the official language of countries such as Germany and Austria.

    Italian, written in the Latin script, is the official language of Italy, San Marino, and parts of Switzerland.

    Dutch, written in the Latin script, is the official language of the Netherlands, parts of Belgium (Flanders), and Suriname.

    Danish, written in the Latin script, is the official language of Denmark.

    Irish, written in the Latin script, is one of the official languages of Ireland.

    Welsh, written in the Latin script, is an official language of Wales.

    Finnish, written in the Latin script, is the official language of Finland.

    Icelandic, written in the Latin script, is the official language of Iceland.

    Swedish, written in the Latin script, is the official language of Sweden.

    Norwegian Nynorsk, written in the Latin script, is an official written standard for the Norwegian language, used alongside Norwegian Bokmål.

    Norwegian Bokmål, written in the Latin script, is the predominant written standard for the Norwegian language.

    Japanese, written in the Japanese script, is the official language of Japan.

    Korean, written in Hangul, is the official language of South Korea and North Korea.

    Vietnamese, written in the Latin script, is the official language of Vietnam.

    Thai, written in the Thai script, is the official language of Thailand.

    Indonesian, written in the Latin script, is the official language of Indonesia.

    Malay, written in the Latin script, is a major language in countries such as Malaysia.

    Burmese, written in the Burmese script, is the official language of Myanmar.

    Tagalog, written in the Latin script, is one of the major languages of the Philippines.

    Khmer, written in the Khmer script, is the official language of Cambodia.

    Lao, written in the Lao script, is the official language of Laos.

    Hindi, written in the Devanagari script, is one of the official languages of India.

    Bengali, written in the Bengali script, is the official language of Bangladesh and the Indian state of West Bengal.

    Urdu, written in the Arabic script, is an official language of Pakistan and is also spoken in India.

    Nepali, written in the Devanagari script, is the official language of Nepal.

    Hebrew, written in the Hebrew script, is the official language of Israel.

    Turkish, written in the Latin script, is the official language of Türkiye and Northern Cyprus.

    Persian, written in the Arabic script, is the official language of countries such as Iran and Tajikistan.

    Polish, written in the Latin script, is the official language of Poland.

    Ukrainian, written in the Cyrillic script, is the official language of Ukraine.

    Czech, written in the Latin script, is the official language of the Czech Republic.

    Romanian, written in the Latin script, is the official language of Romania and Moldova.

    Bulgarian, written in the Cyrillic script, is the official language of Bulgaria.

    Slovak, written in the Latin script, is the official language of Slovakia.

    Hungarian, written in the Latin script, is the official language of Hungary.

    Slovenian, written in the Latin script, is the official language of Slovenia.

    Latvian, written in the Latin script, is the official language of Latvia.

    Estonian, written in the Latin script, is the official language of Estonia.

    Lithuanian, written in the Latin script, is the official language of Lithuania.

    Belarusian, written in the Cyrillic script, is one of the official languages of Belarus.

    Greek, written in the Greek script, is the official language of Greece and Cyprus.

    Croatian, written in the Latin script, is the official language of Croatia.

    Macedonian, written in the Cyrillic script, is the official language of North Macedonia.

    Maltese, written in the Latin script, is the official language of Malta.

    Serbian, written in the Cyrillic script, is the official language of Serbia.

    Bosnian, written in the Latin script, is one of the official languages of Bosnia and Herzegovina.

    Georgian, written in the Georgian script, is the official language of Georgia.

    Armenian, written in the Armenian script, is the official language of Armenia.

    North Azerbaijani, written in the Latin script, is the official language of Azerbaijan.

    Kazakh, written in the Cyrillic script, is the official language of Kazakhstan.

    Northern Uzbek, written in the Latin script, is the official language of Uzbekistan.

    Tajik, written in the Cyrillic script, is the official language of Tajikistan.

    Swahili, written in the Latin script, is a lingua franca or an official language in many East African countries.

    Afrikaans, written in the Latin script, is mainly spoken in South Africa and Namibia.

    Cantonese is written in Traditional Chinese characters. It is a primary language in Guangdong Province, Hong Kong, and Macao.

    Luxembourgish, written in the Latin script, is spoken in Luxembourg and parts of Germany, and is an official language of Luxembourg.

    Limburgish, written in the Latin script, is mainly spoken in the Netherlands, Belgium, and parts of Germany.

    Catalan, written in the Latin script, is spoken in Catalonia and other parts of Spain.

    Galician, written in the Latin script, is mainly spoken in the Galicia region of Spain.

    Asturian, written in the Latin script, is mainly spoken in the Asturias region of Spain.

    Basque, written in the Latin script, is spoken in the Basque Country of Spain and France. It is an official language of the Basque Autonomous Community in Spain.

    Occitan, written in the Latin script, is mainly spoken in the southern regions of France.

    Venetian, written in the Latin script, is mainly spoken in the Veneto region of Italy.

    Sardinian, written in the Latin script, is mainly spoken on the island of Sardinia in Italy.

    Sicilian, written in the Latin script, is mainly spoken on the island of Sicily in Italy.

    Friulian, written in the Latin script, is mainly spoken in the Friuli-Venezia Giulia region of Italy.

    Lombard, written in the Latin script, is mainly spoken in the Lombardy region of Italy.

    Ligurian, written in the Latin script, is mainly spoken in the Liguria region of Italy.

    Faroese, written in the Latin script, is an official language of the Faroe Islands.

    Tosk Albanian, written in the Latin script, is the southern dialect of Albanian.

    Silesian, written in the Latin script, is mainly spoken in Poland.

    Bashkir, written in the Cyrillic script, is mainly spoken in Bashkortostan, Russia.

    Tatar, written in the Cyrillic script, is mainly spoken in Tatarstan, Russia.

    Mesopotamian Arabic, written in the Arabic script, is mainly spoken in Iraq.

    Najdi Arabic, written in the Arabic script, is mainly spoken in the Najd region of Saudi Arabia.

    Egyptian Arabic, written in the Arabic script, is mainly spoken in Egypt.

    Levantine Arabic, written in the Arabic script, is mainly spoken in Syria and Lebanon.

    Ta'izzi-Adeni Arabic, written in the Arabic script, is mainly spoken in Yemen and the Hadhramaut region of Saudi Arabia.

    Dari, written in the Arabic script, is one of the official languages of Afghanistan.

    Tunisian Arabic, written in the Arabic script, is mainly spoken in Tunisia.

    Moroccan Arabic, written in the Arabic script, is mainly spoken in Morocco.

    Kabuverdianu, written in the Latin script, is mainly spoken in Cape Verde.

    Tok Pisin, written in the Latin script, is one of the main lingua francas in Papua New Guinea.

    Eastern Yiddish, written in the Hebrew script, is mainly spoken in Jewish communities.

    Sindhi, written in the Arabic script, is an official language of the Sindh province in Pakistan.

    Sinhala, written in the Sinhala script, is one of the official languages of Sri Lanka.

    Telugu, written in the Telugu script, is an official language of the Andhra Pradesh and Telangana states in India.

    Punjabi, written in the Gurmukhi script, is an official language of India spoken in the Punjab state.

    Tamil, written in the Tamil script, is an official language of the Tamil Nadu state in India and of Sri Lanka.

    Gujarati, written in the Gujarati script, is an official language of the Gujarat state in India.

    Malayalam, written in the Malayalam script, is an official language of the Kerala state in India.

    Marathi, written in the Devanagari script, is an official language of the Maharashtra state in India.

    Kannada, written in the Kannada script, is an official language of the Karnataka state in India.

    Magahi, written in the Devanagari script, is mainly spoken in the Bihar state of India.

    Odia, written in the Odia script, is an official language of the Odisha state in India.

    Awadhi, written in the Devanagari script, is mainly spoken in the Uttar Pradesh state of India.

    Maithili, written in the Devanagari script, is an official language of India spoken in the Bihar state and the Terai plains of Nepal.

    Assamese, written in the Bengali script, is an official language of the Assam state in India.

    Chhattisgarhi, written in the Devanagari script, is mainly spoken in the Chhattisgarh state of India.

    Bhojpuri, written in the Devanagari script, is spoken in parts of India and Nepal.

    Minangkabau, written in the Latin script, is mainly spoken on the island of Sumatra in Indonesia.

    Balinese, written in the Latin script, is mainly spoken on the island of Bali in Indonesia.

    Javanese is written in the Latin script but also commonly uses the Javanese script. It is widely spoken on the island of Java in Indonesia.

    Banjar, written in the Latin script, is mainly spoken on the island of Kalimantan in Indonesia.

    Sundanese is written in the Latin script but traditionally uses the Sundanese script. It is mainly spoken in the western part of the island of Java in Indonesia.

    Cebuano, written in the Latin script, is mainly spoken in the Cebu region of the Philippines.

    Pangasinan, written in the Latin script, is mainly spoken in the Pangasinan province of the Philippines.

    Iloko, written in the Latin script, is mainly spoken in the Philippines.

    Waray (Philippines), written in the Latin script, is mainly spoken in the Philippines.

    Haitian Creole, written in the Latin script, is one of the official languages of Haiti.

    Papiamento, written in the Latin script, is mainly spoken in Caribbean regions such as Aruba and Curaçao.

  5. Response format fixes: Fixes response format issues from previous versions, such as abnormal Markdown, intermediate truncation, and incorrect boxed output.

The Qwen3 open-source models released in April 2025 do not support non-streaming output in thinking mode.
If a Qwen3 open-source model is in thinking mode but does not output a thinking process, it is billed at the non-thinking mode price.

Thinking mode | Non-thinking mode | Usage

Model

Mode

Context window

Maximum input

Maximum chain-of-thought

Maximum response

Input price

Output price

Free quota

(Note)

(Tokens)

(Million tokens)

qwen3-next-80b-a3b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.5

$6

1 million tokens per model

Valid for 90 days after you activate Alibaba Cloud Model Studio

qwen3-next-80b-a3b-instruct

Non-thinking only

129,024

-

$0.5

$2

qwen3-235b-a22b-thinking-2507

Thinking only

126,976

81,920

$0.7

$8.4

qwen3-235b-a22b-instruct-2507

Non-thinking only

129,024

-

$0.7

$2.8

qwen3-30b-a3b-thinking-2507

Thinking only

126,976

81,920

$0.2

$2.4

qwen3-30b-a3b-instruct-2507

Non-thinking only

129,024

-

$0.8

qwen3-235b-a22b

This model and the following models are scheduled for release in April 2025.

Non-thinking

129,024

-

16,384

$0.7

$2.8

Thinking

98,304

38,912

$8.4

qwen3-32b

Non-thinking

129,024

-

$2.8

Thinking

98,304

38,912

$8.4

qwen3-30b-a3b

Non-thinking

129,024

-

$0.2

$0.8

Thinking

98,304

38,912

$2.4

qwen3-14b

Non-thinking

129,024

-

8,192

$0.35

$1.4

Thinking

98,304

38,912

$4.2

qwen3-8b

Non-thinking

129,024

-

$0.18

$0.7

Thinking

98,304

38,912

$2.1

qwen3-4b

Non-thinking

129,024

-

$0.11

$0.42

Thinking

98,304

38,912

$1.26

qwen3-1.7b

Non-thinking

32,768

30,720

-

$0.42

Thinking

28,672

The total number of input and output tokens cannot exceed 30,720.

$1.26

qwen3-0.6b

Non-thinking

30,720

-

$0.42

Thinking

28,672

The total number of input and output tokens cannot exceed 30,720.

$1.26

Qwen2.5

Qwen2.5 is a series of Qwen large language models. For Qwen2.5, we have released a series of base language models and instruction-tuned language models with parameter sizes ranging from 7 billion to 72 billion. Qwen2.5 includes the following improvements over Qwen2:

  • It is pre-trained on our latest large-scale dataset, which contains up to 18 trillion tokens.

  • Our specialized expert models in these fields have significantly increased the model's knowledge and greatly improved its coding and math capabilities.

  • It has significant improvements in following instructions, generating long text (over 8K tokens), understanding structured data (such as tables), and generating structured output (especially JSON). It is more resilient to the diversity of system prompts, which enhances the implementation of role-play and conditional settings for chatbots.

  • It supports over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.

UsageAPI referenceTry it online

Model

Context window

Maximum input

Maximum output

Input price

Output price

Free quota

(Tokens)

(Million tokens)

qwen2.5-14b-instruct-1m

1,008,192

1,000,000

8,192

$0.805

$3.22

1 million tokens for each model

Valid for 90 days after activating Model Studio

qwen2.5-7b-instruct-1m

$0.368

$1.47

qwen2.5-72b-instruct

131,072

129,024

$1.4

$5.6

qwen2.5-32b-instruct

$0.7

$2.8

qwen2.5-14b-instruct

$0.35

$1.4

qwen2.5-7b-instruct

$0.175

$0.7

Qwen-Omni

A new multimodal large model for understanding and generation, trained on Qwen2.5. It supports text, image, speech, and video inputs, and can generate text and speech simultaneously in a stream. The speed of multimodal content understanding is significantly improved.Usage | API reference

Model

Context window

Maximum input

Maximum output

Free quota (Note)

(Tokens)

qwen2.5-omni-7b

32,768

30,720

2,048

1 million tokens (regardless of modality)

Valid for 90 days after activation

After the free quota is used up, the following billing rules apply to inputs and outputs:

Input billing item

Price (Million tokens)

Text

$0.10

Audio

$6.76

Image/Video

$0.28

Output billing item

Price (Million tokens)

Text

$0.40 (if the input contains only text)

$0.84 (if the input contains images, audio, or video)

Text and audio

$13.51 (for audio)

The text portion of the output is not billed.

Qwen3-Omni-Captioner

Qwen3-Omni-Captioner is an open-source model based on Qwen3-Omni. Without requiring prompts, it automatically generates accurate and comprehensive descriptions for complex audio that includes speech, ambient sounds, music, and sound effects. The model can detect speaker emotions, music elements such as style and instruments, and sensitive information. It is ideal for applications such as audio content analysis, security audits, intent recognition, and audio editing. Usage | API reference

Model name

Context length

Max input

Max output

Input cost

Output cost

Free quota

(Note)

(Tokens)

(Per 1,000,000 tokens)

qwen3-omni-30b-a3b-captioner

65,536

32,768

32,768

$3.81

$3.06

1,000,000 tokens

Valid for 90 days after you activate Alibaba Cloud Model Studio.

Qwen-VL

This is the open-source version of Alibaba Cloud's Qwen-VL. Usage | API reference

The Qwen3-VL model offers significant improvements over Qwen2.5-VL:

  • Agent interaction: It operates computer or mobile phone interfaces, detects GUI elements, understands features, and invokes tools to perform tasks. It achieves top-tier performance in evaluations such as OS World.

  • Visual encoding: It generates code from images or videos. This feature can be used to create HTML, CSS, and JS code from design drafts or website screenshots.

  • Spatial intelligence: It supports 2D and 3D positioning and accurately determines object orientation, perspective changes, and occlusion relationships.

  • Long video understanding: It understands video content up to 20 minutes long and can pinpoint specific moments with second-level accuracy.

  • Deep thinking: It excels at capturing details and analyzing causality, achieving top-tier performance in evaluations such as MathVista and MMMU.

  • OCR: It supports 33 languages and performs more stably in scenarios with complex lighting, blur, or tilt. It also significantly improves the accuracy of recognizing rare characters, ancient script, and technical terms.

    Supported languages

    The model supports the following 33 languages: Chinese, Japanese, Korean, Indonesian, Vietnamese, Thai, English, French, German, Russian, Portuguese, Spanish, Italian, Swedish, Danish, Czech, Norwegian, Dutch, Finnish, Türkiye, Polish, Swahili, Romanian, Serbian, Greek, Kazakh, Uzbek, Cebuano, Arabic, Urdu, Persian, Hindi/Devanagari, and Hebrew.

Qwen3-VL

Model

Mode

Context window

Maximum input

Maximum chain-of-thought

Maximum response length

Input price

Output price

Chain-of-thought + output

Free quota

(Note)

(Tokens)

(Million tokens)

qwen3-vl-30b-a3b-thinking

Thinking mode only

131,072

126,976

81,920

32,768

$0.2

$2.4

1 million tokens each

Valid for 90 days after Model Studio activation.

qwen3-vl-30b-a3b-instruct

Non-thinking mode only

129,024

-

$0.8

qwen3-vl-235b-a22b-thinking

Thinking only

126,976

81,920

$0.7

$8.4

qwen3-vl-235b-a22b-instruct

Non-thinking mode only

129,024

-

$2.8

Qwen2.5-VL

Model

Context window

Maximum input

Maximum output

Input price

Output price

Free quota

(Note)

(Tokens)

(Million tokens)

qwen2.5-vl-72b-instruct 

131,072

129,024

Max 16,384 per image

8,192

$2.8

$8.4

1 million tokens for input and output each

Validity: 90 days after you activate Alibaba Cloud Model Studio

qwen2.5-vl-32b-instruct

$1.4

$4.2

qwen2.5-vl-7b-instruct

$0.35

$1.05

qwen2.5-vl-3b-instruct

$0.21

$0.63

Qwen-Coder

Qwen-Coder is an open source code model from Qwen. The latest, qwen3-coder-480b-a35b-instruct, is a code generation model based on Qwen3 with powerful Coding Agent capabilities. It excels at tool calling, environment interaction, and autonomous programming. The model combines excellent coding skills with general-purpose capabilities.Usage | API reference

Model

Context window

Maximum input

Maximum output

Input price

Output price

Free quota

(Note)

(Tokens)

qwen3-coder-480b-a35b-instruct

262,144

204,800

65,536

Tiered pricing applies, see the description below the table.

1 million tokens each

Validity: Within 90 days after you activate Model Studio

qwen3-coder-30b-a3b-instruct

The qwen3-coder-480b-a35b-instruct and qwen3-coder-30b-a3b-instruct models use tiered billing based on the number of input tokens per request (left-open, right-closed intervals).

Model

Number of input tokens

Input price (Million tokens)

Output price (Million tokens)

qwen3-coder-480b-a35b-instruct

0–32K

$1.5

$7.5

32K–128K

$2.7

$13.5

128K–200K

$4.5

$22.5

qwen3-coder-30b-a3b-instruct

0–32K

$0.45

$2.25

32K–128K

$0.75

$3.75

128K–200K

$1.2

$6

Image generation

Qwen text-to-image

The Qwen text-to-image model excels at complex text rendering, especially for Chinese and English text. Currently, qwen-image-plus has the same capabilities as qwen-image, but qwen-image-plus has lower price. API reference

Model

Unit price

Free quota

qwen-image-plus

$0.03 per image

Free quota: 100 images for each model

Validity period: Within 90 days after you activate Alibaba Cloud Model Studio.

qwen-image

$0.035 per image

Input prompt

Output image

Healing-style hand-drawn poster featuring three puppies playing with a ball on lush green grass, adorned with decorative elements such as birds and stars. The main title “Come Play Ball!” is prominently displayed at the top in bold, blue cartoon font. Below it, the subtitle “Come [Show Off Your Skills]!” appears in green font. A speech bubble adds playful charm with the text: “Hehe, watch me amaze my little friends next!” At the bottom, supplementary text reads: “We get to play ball with our friends again!” The color palette centers on fresh greens and blues, accented with bright pink and yellow tones to highlight a cheerful, childlike atmosphere.

image

Qwen image editing

The Qwen image editing model supports precise text editing in Chinese and English. It also supports operations such as color adjustment, detail enhancement, style transfer, adding or removing objects, and changing positions and actions. These features enable complex editing of images and text. API reference

Model

Unit price

Free quota

qwen-image-edit

$0.045/image

Free quota: 100 images

Valid for 90 days after activating Alibaba Cloud Model Studio.

dog_and_girl (1)

Original image

狗修改图

Change the person to a standing position, bending over to hold the dog's front paws.

image

Original image

image

Replace the words 'HEALTH INSURANCE' on the letter blocks with '明天会更好' (Tomorrow will be better).

5

Original image

5out

Replace the dotted shirt with a light blue shirt.

6

Original image

6out

Change the background in the image to Antarctica.

7

Original image

7out

Generate a cartoon profile picture of the person.

image

Original image

image

Remove the hair from the dinner plate.

Wan text-to-image

The Wan text-to-image model generates exquisite images from text. API reference | Try it online

Model

Description

Unit price

Free quota (Note)

The free quota is valid for 90 days after you activate Alibaba Cloud Model Studio.

wan2.5-t2i-preview Recommended

The Wan 2.5 preview removes the single-side limitation, allowing you to freely select image dimensions within the constraints of total pixel area and aspect ratio.

$0.03/image

50 images

wan2.2-t2i-plus Recommended

The Wan 2.2 Professional Edition features comprehensive upgrades that enhance creativity, stability, and photorealistic quality.

$0.05/image

100 images

wan2.2-t2i-flash Recommended

The Wan 2.2 Express Edition features comprehensive upgrades that enhance creativity, stability, and photorealistic quality.

$0.025/image

100 images

wan2.1-t2i-plus

The Wan 2.1 Professional Edition generates images with richer details.

$0.05/image

200 images

wan2.1-t2i-turbo

The Wan 2.1 Turbo Edition offers balanced performance and high cost-effectiveness.

$0.025/image

200 images

Input prompt

Output image

A needle-felted Santa Claus holding a gift and a white cat standing next to him, with a background of colorful gifts and green plants creating a cute, warm, and cozy scene.

image

Wan2.5 general image editing

The Wan2.5 general image editing model supports entity-consistent image editing and multi-image fusion. It accepts text, a single image, or multiple images as input. API reference

Model

Unit price

Free quota(Note)

Valid for 90 days after you activate Alibaba Cloud Model Studio.

wan2.5-i2i-preview

$0.03/image

50 images

Feature

Input example

Output image

Single-image editing

damotest2023_Portrait_photography_outdoors_fashionable_beauty_409ae3c1-19e8-4515-8e50-b3c9072e1282_2-转换自-png

Replace the floral dress with a vintage-style lace gown that has delicate embroidery on the collar and cuffs.

a26b226d-f044-4e95-a41c-d1c0d301c30b-转换自-png

Multi-image fusion

图像编辑2图像编辑2

Place the alarm clock from Image 1 next to the vase on the dining table in Image 2.

图像编辑2

Video generation - Wan

Text-to-video

The Wan text-to-video model generates videos from a single sentence. The videos feature rich artistic styles and cinematic quality. API reference | Try it online

Model

Description

Unit price

Free quota(Claim)

Valid for 90 days after activating Alibaba Cloud Model Studio

wan2.5-t2v-previewRecommended

Wan 2.5 preview supports automatic dubbing and custom audio files.

480P: $0.05/second

720P: $0.10/second

1080P: $0.15/second

50 seconds

wan2.2-t2v-plus Recommended

Wan 2.2 professional edition. This model provides significant improvements in image detail and motion stability.

480P: $0.02/second

1080P: $0.10/second

50 seconds

wan2.1-t2v-turbo

Wan 2.1 speed edition. This model provides fast generation and balanced performance.

$0.036/second

200 seconds

wan2.1-t2v-plus

Wan 2.1 professional edition. This model generates videos with rich details and enhanced texture.

$0.10/second

200 seconds

Sample prompt

Generated video

Prompt: A kitten running in the moonlight

Image-to-video - based on the first frame

The Wan image-to-video model uses an input image as the first frame of a video. It then generates the rest of the video based on a prompt. The videos feature rich artistic styles and cinematic quality. API reference | Try it online

Model

Description

Unit price

Free quota (Note)

Valid for 90 days after activating Model Studio

wan2.5-i2v-previewRecommended

Wan 2.5 preview supports automatic dubbing and custom audio files.

480P: $0.05/second

720P: $0.10/second

1080P: $0.15/second

50 seconds

wan2.2-i2v-flash Recommended

Wan 2.2 Turbo Edition. This model offers extremely fast generation speeds with significant improvements in image detail and motion stability.

480P: $0.015/second

720P: $0.036/second

50 seconds

wan2.2-i2v-plus Recommended

Wan 2.2 Professional Edition. This model provides significant improvements in image detail and motion stability.

480P: $0.02/second

1080P: $0.10/second

50 seconds

wan2.1-i2v-turbo

Wan 2.1 Turbo Edition. This model offers fast generation speeds and balanced performance.

$0.036/second

200 seconds

wan2.1-i2v-plus

Wan 2.1 Professional Edition. This model generates videos with rich details and enhanced textures.

$0.10/second

200 seconds

Input example

Output video

Input prompt: A cat running on the grass

Input image:

image

The model generates a video based on the prompt, using the input image as the first frame.

Model: wanx2.1-i2v-turbo.

Image-to-video - based on the first and last frames

The Wan first-and-last-frame video model generates a smooth, dynamic video from a prompt. You only need to provide the first and last frame images. The videos feature rich artistic styles and cinematic quality. API reference | Try it online

Model

Price

Free quota (Note)

wan2.1-kf2v-plus

$0.10 per second

200 seconds

Valid for 90 days after you activate Model Studio

Example input

Output video

First frame

Last frame

Prompt

first_frame

last_frame

In a realistic style, the camera starts at eye level on a small black cat looking up at the sky with curiosity, then gradually moves upward to end in a top-down shot focused on the cat's curious eyes.

General video editing

The Wan unified video editing model supports multimodal inputs, including text, images, and videos. It can perform video generation and general editing tasks. API reference | Try it online

Model

Price

Free quota

wan2.1-vace-plus

$0.10 per second

50 seconds

Valid for 90 days after you activate Model Studio.

The unified video editing model supports the following features:

Feature

Input reference image

Input prompt

Output video

Multi-image reference

Reference image 1 (reference entity)

image

Reference image 2 (reference background)

image

In the video, a girl gracefully walks out from a misty, ancient forest. Her steps are light, and the camera captures her every nimble moment. When the girl stops and looks around at the lush woods, a smile of surprise and joy blossoms on her face. This scene, frozen in a moment of interplay between light and shadow, records the girl's wonderful encounter with nature.

Output video

Video repainting

The video shows a black steampunk-style car driven by a gentleman. The car is decorated with gears and copper pipes. The background features a steam-powered candy factory and retro elements, creating a vintage and playful scene.

Local editing

Input video

Input mask image (The white area indicates the editing area)

mask

The video shows a Parisian-style French cafe where a lion in a suit is elegantly sipping coffee. It holds a coffee cup in one hand, taking a gentle sip with a relaxed expression. The cafe is tastefully decorated, with soft hues and warm lighting illuminating the area where the lion is.

The content in the editing area is modified based on the prompt.

Video extension

Input first clip (1 second)

A dog wearing sunglasses is skateboarding on the street, 3D cartoon.

Output extended video (5 seconds)

Video outpainting

An elegant lady is passionately playing the violin, with a full symphony orchestra behind her.

Speech synthesis (text-to-speech)

Qwen-TTS

Model

Version

Unit price

Maximum input characters

Supported languages

Free quota(Note)

qwen3-tts-flash

Currently same capabilities as qwen3-tts-flash-2025-09-18

Stable

$0.1/10,000 characters

600

Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

2,000 characters for each

Validity: 90 days after activating Model Studio

qwen3-tts-flash-2025-09-18

Snapshot

Qwen-TTS-Realtime

Model

Version

Price

Supported languages

Free quota(Note)

qwen3-tts-flash-realtime

Currently same performance as qwen3-tts-flash-realtime-2025-09-18

Stable

$0.13 per 10,000 characters

Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

2,000 characters for each

Validity period: 90 days after you activate Model Studio

qwen3-tts-flash-realtime-2025-09-18

Snapshot

Speech recognition and translation (speech-to-text)

Qwen3-LiveTranslate-Flash-Realtime

qwen3-livetranslate-flash-realtime is a multilingual, real-time audio and video translation model. It recognizes 18 languages and translates them into audio in 10 languages in real time.

Core features:

  • Multilingual support: Supports 18 languages, such as Chinese, English, French, German, Russian, Japanese, and Korean, and 6 Chinese dialects, such as Mandarin, Cantonese, and Sichuanese.

  • Visual enhancement: Improves translation accuracy using visual content. The model analyzes lip movements, actions, and on-screen text to improve translation in noisy environments or for words with multiple meanings.

  • Low latency: Achieves a simultaneous interpretation latency as low as 3 seconds.

  • Lossless simultaneous interpretation: Uses semantic unit prediction technology to resolve cross-language word order issues. This ensures that the quality of real-time translation is nearly identical to that of offline translation.

  • Natural voice: Generates human-like speech with a natural voice. The model adapts its tone and emotion based on the source audio content.

Usage

Model

Version

Context window

Maximum input

Maximum output

Free quota

(Note)

(Tokens)

qwen3-livetranslate-flash-realtime

Current capabilities are equivalent to qwen3-livetranslate-flash-realtime-2025-09-22

Stable

53248

49,152

4,096

1 million tokens for each

Validity: 90 days after activating Alibaba Cloud Model Studio

qwen3-livetranslate-flash-realtime-2025-09-22

Snapshot

After the free quota is exhausted, inputs and outputs are billed as follows:

Input billing item

Price (Million tokens)

Input: Audio

$10

Input: Image

$1.3

Output billing item

Price (Million tokens)

Text

$10

Audio

$38

Fun-ASR

Fun-ASR is an end-to-end, large-scale automatic speech recognition (ASR) model from Qwen Lab. It is built on advanced, self-developed speech technology and provides excellent contextual awareness and high-accuracy transcription. API reference.

Audio file recognition

Model

Version

Supported languages

Supported sample rates

Scenarios

Supported audio formats

Price

Free quota

fun-asr

Currently equivalent to fun-asr-2025-08-25

Stable

Chinese, English

Any

ApsaraVideo Live, voice calls, real-time conference interpretation, and more

aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

$0.000035/second

36,000 seconds (10 hours)

Valid for 90 days

fun-asr-2025-08-25

Snapshot

Text embedding

Text embedding models convert text into numerical representations for tasks such as search, clustering, recommendation, and classification. Billing for these models is based on the number of input tokens. API reference

Model

Embedding dimensions

Batch size

Maximum tokens per row

Supported languages

Price

(Million input tokens)

Free quota

(Note)

text-embedding-v3

1,024 (default), 768, or 512

10

8,192

Over 50 languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian

$0.07

500,000 tokens

Valid for 90 days after Model Studio activation.

Role-playing

Qwen's role-playing model is ideal for scenarios that require human-like conversation, such as virtual social interactions, game NPCs, IP character replication, hardware, toys, and in-vehicle systems. This model offers enhanced capabilities in character fidelity, conversation progression, and empathetic listening compared to other Qwen models. Usage

Model

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen-plus-character-ja

8,192

7,680

512

$0.5

$1.4

Retired models (Singapore region)

Retired on August 20, 2025

Qwen2

Alibaba Cloud's open-source Qwen2.Usage | API reference | Try it online

Model

Context window

Maximum input

Maximum output

Input price

Output price

Alternative models

(Tokens)

(Million tokens)

qwen2-72b-instruct

131,072

128,000

6,144

Free for a limited time

Qwen3, DeepSeek, Kimi, and others

qwen2-57b-a14b-instruct

65,536

63,488

qwen2-7b-instruct

131,072

128,000

Qwen1.5

Alibaba Cloud's open-source Qwen1.5.Usage | API reference | Try it online

Model Name

Context Window (Tokens)

Maximum Input (Tokens)

Maximum Output (Tokens)

Input Price (Million tokens)

Output Price (Million tokens)

Alternative Models

Tokens

(Million tokens)

qwen1.5-110b-chat

8,000

6,000

2,000

Free for a limited time

Qwen3, DeepSeek, Kimi, and others

Qwen1.5-72B-Chat

qwen1.5-32b-chat

qwen1.5-14b-chat

qwen1.5-7b-chat

Flagship models (Beijing region)

Most powerful general-purpose models

通义new Qwen-Max

Suitable for complex tasks, most powerful

通义new Qwen-Plus

Balanced performance, speed, and cost

通义new Qwen-Turbo

Suitable for simple jobs, fast and low cost

通义new Qwen-Coder

Excellent at coding and proficient in tool calling and environment interaction

Maximum context window

(Tokens)

262,144

1,000,000

1,000,000

1,000,000

Input price

(Million tokens)

$0.345

$0.115

$0.044

$0.287

Output price

(Million tokens)

$1.377

$0.287

$0.087

$0.861

For detailed parameters and more models, see the tables that follow.

Model overview

Category

Model

Description

Text generation

General-purpose large language model

Multimodal model

The visual understanding model Qwen-VL, the visual reasoning model QVQ, and the omni-modal model Qwen-Omni

Realm model

Code model, Math model, Translation model, Data mining model, Intention recognition model, Role assumption model

Image generation

Text-to-image

  • Qwen text-to-image: Excels at complex text rendering, especially for Chinese and English text.

  • Wan text-to-image: Generates certificate photos, E-commerce main images, model photos, and portrait photos in various styles, such as anime, Chinese style, and 2D style.

Wan

General-purpose models:

  • Qwen Image Editing: Supports Chinese and English prompts for complex image and text editing operations, such as style transfer, text modification, and object editing.

  • Wan Image Editing: Generates or edits images. You can create certificate photos, E-commerce main images, model photos, and portraits in various styles, such as anime, Chinese style, and ACG. You can also remove backgrounds, generate backgrounds, and change image elements.

More models: Qwen Image Translation, OutfitAnyone

Speech synthesis and recognition

Speech synthesis

Qwen-TTS and CosyVoice convert text to speech for scenarios such as voice-based customer service, audiobooks, in-car navigation, and educational tutoring.

Speech recognition and translation

Paraformer converts speech to text for scenarios such as real-time meeting transcription, real-time live stream captioning, and customer service calls.

Video editing and generation

Text-to-video

  • Text-to-Video: Generates high-quality videos in a wide variety of styles from a single sentence.

Image-to-video

  • First-frame-to-video: Generates a video from an initial image and a prompt.

  • First-and-last-frame-to-video: Generates a video with a natural transition based on the first and last frame images and a prompt.

  • Multi-image-to-video: Generates a video from one or more images and a text prompt, based on the entities or backgrounds in the source images.

  • Dance video generation: AnimateAnyone generates dance videos from a character image and an action video.

  • Lip-sync video generation from an image and audio

    • Wan-digital human generates video from a portrait image and audio. It provides a wide and natural range of motion, supports various frame sizes such as full-body, half-body, and portrait, and is suitable for singing and performance scenarios.

    • EMO uses a person's image and audio to generate video with highly expressive lip-syncing and facial expressions. It supports portrait and half-body shots and is ideal for close-up scenarios.

    • LivePortrait uses a portrait image and an audio file and is ideal for voice narration scenarios.

  • Emoji video generation: Emoji generates facial emoji videos from facial images and preset dynamic facial templates.

General video editing

  • General video editing: Performs various video editing tasks based on text prompts, images, and videos. For example, you can generate a new video by extracting motion features from an input video and combining them with a text prompt.

  • Video lip-syncing: VideoRetalk uses a person's video and audio and is ideal for short video production and video translation.

  • Video style transfer: Video Style Repainting transforms videos into various styles, such as Japanese manga and American comics.

Embedding

Text embedding

Converts text into a numerical vector representation. These embeddings are used for search, clustering, recommendation, and classification.

Multimodal embedding

Converts text, images, and speech into numerical vectors. These embeddings are used for audio and video classification, image classification, and image-text retrieval.

Industry

Intention recognition

The Intention Recognition Model parses user intent in milliseconds and selects the appropriate tools to resolve user issues.

Text generation - Qwen

The following are the Qwen commercial models. Compared to the open-source editions, the commercial models have the latest capabilities and improvements.

Models are updated and upgraded periodically. To use a fixed version, you can select a snapshot. Snapshots are typically maintained for one month after the release of the next snapshot.
We recommend that you use the stable or latest version because their rate limits are looser.

Qwen-Max

This is the best-performing model in the Qwen series. The model is suitable for complex and multi-step tasks. Usage | API reference | Try it online

Qwen3-Max

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen3-max

Currently same capabilties as qwen3-max-2025-09-23

Stable

262,144

258,048

65,536

Tiered pricing. See the notes below this table.

qwen3-max-2025-09-23

Snapshot

qwen3-max-preview

Preview

Qwen3-Max uses tiered pricing based on the number of input tokens (left-open, right-closed intervals).

Input Tokens

Input Price (Million tokens)

qwen3-max and qwen3-max-preview support context cache.

Output Price (Million tokens)

0-32K

$0.861

$3.441

32K-128K

$1.434

$5.735

128K-252K

$2.151

$8.602

qwen3-max and qwen3-max-2025-09-23 support search agent, see Web search.

Qwen-Max

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen-max

Offers the same capabilities as qwen-max-2024-09-19.

Stable

32,768

30,720

8,192

$0.345

$1.377

qwen-max-latest

Always points to the latest snapshot.

Latest

131,072

129,024

qwen-max-2025-01-25

Also known as qwen-max-0125, Qwen2.5-Max

Snapshot

qwen-max-2024-09-19

Also known as qwen-max-0919.

32,768

30,720

$2.868

$8.602

Qwen-Plus

This is a balanced model. Its inference performance, cost, and speed are between those of Qwen-Max and Qwen-Turbo. It is ideal for moderately complex tasks.

Usage | API reference | Try it online | Thinking mode

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen-plus

Same capabilities as qwen-plus-2025-07-28.
Part of the Qwen3 series.

Stable

1,000,000

Thinking mode

995,904

Non-thinking mode

997,952

The default values are 131,072. You can adjust this value using the max_input_tokens parameter.

32,768

Maximum CoT is 81,920.

Tiered pricing, see the description below the table.

qwen-plus-latest

Same capabilities as qwen-plus-2025-07-28.
Part of the Qwen3 series.

Latest

Thinking mode

995,904

Non-thinking mode

997,952

qwen-plus-2025-09-11

Part of the Qwen3 series.

Snapshot

Thinking mode

995,904

Non-thinking mode

997,952

qwen-plus-2025-07-28

Also known as qwen-plus-0728.
Part of the Qwen3 series.

qwen-plus-2025-07-14

Also known as qwen-plus-0714.
Part of the Qwen3 series.

131,072

Thinking mode

98,304

Non-thinking mode

129,024

16,384

Maximum CoT is 38,912.

$0.115

Thinking mode

$1.147

Non-thinking mode

$0.287

qwen-plus-2025-04-28

Also known as qwen-plus-0428.
Part of the Qwen3 series

The qwen-plus, qwen-plus-latest, qwen-plus-2025-09-11, and qwen-plus-2025-07-28 models use tiered pricing based on the number of input tokens per request (left-open, right-closed intervals).

Input tokens

Input price (Million tokens)

Mode

Output price (Million tokens)

0-128K

$0.115

Non-thinking mode

$0.287

Thinking mode

$1.147

128K-256K

$0.345

Non-thinking mode

$2.868

Thinking mode

$3.441

256K-1M

$0.689

Non-thinking mode

$6.881

Thinking mode

$9.175

These models support thinking and non-thinking modes. You can switch between the two modes using the enable_thinking parameter. In addition, the model's capabilities have been significantly enhanced:

  1. Inference capability: In evaluations of math, code, and logical reasoning, it significantly surpasses QwQ and non-reasoning models of the same size, which reaches the top tier in the industry for its scale.

  2. Human preference capability: Creative writing, role-play, multi-turn conversation, and instruction-following capabilities have all been greatly improved. Its general capabilities significantly exceed those of models of the same size.

  3. Agent capability: It achieves industry-leading levels in both thinking and non-thinking modes and can accurately call external tools.

  4. Multilingual capability: It supports over 100 languages and dialects, with significant improvements in multilingual translation, instruction understanding, and common-sense reasoning.

  5. Response format: This version fixes response format issues from previous versions, such as abnormal Markdown, intermediate truncation, and incorrect boxed output.

For these models, if you enable thinking mode but no thought process is output, you are charged at the non-thinking mode rate.

More historical snapshot models

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen-plus-2025-01-25

Alias: qwen-plus-0125

Snapshot

131,072

129,024

8,192

$0.115

$0.287

qwen-plus-2025-01-12

Also known as qwen-plus-0112.

qwen-plus-2024-12-20

Also known as qwen-plus-1220

qwen-plus-2024-11-27

Also known as qwen-plus-1125.

qwen-plus-2024-11-25

Also known as qwen-plus-1125.

qwen-plus-2024-09-19

Also known as qwen-plus-0919

qwen-plus-2024-08-06

Also known as qwen-plus-0806.

128,000

$0.574

$1.721

Qwen-Flash

This is the fastest and most cost-effective model in the Qwen series. It is ideal for simple tasks. Qwen-Flash uses flexible tiered pricing for more reasonable billing. UsageAPI referenceDeep thinking

Model

Version

Context window

Maximum input

Maximum chain-of-thought

Maximum response

Input price

Output price

(Tokens)

(Million tokens)

qwen-flash

Provides the same capabilities as qwen-flash-2025-07-28.
A model in the Qwen3 series.

Stable

1,000,000

1,044,480

32,768

81,920

Tiered pricing applies. For more information, see the description below this table.

qwen-flash-2025-07-28

Also known as qwen-flash-0728.

Snapshot

The qwen-flash and qwen-flash-2025-07-28 models use tiered pricing based on the number of input tokens per request (left-open, right-closed intervals). The qwen-flash model supports context cache.

Input tokens

Input price (Million tokens)

Output price (Million tokens)

0–128K

$0.022

$0.216

128K–256K

$0.087

$0.861

256K–1M

$0.173

$1.721

Qwen-Turbo

Qwen-Turbo is no longer updated. We recommend replacing it with Qwen-Flash. Qwen-Flash uses flexible tiered pricing for more reasonable billing. Usage | API reference | Try it online | Thinking mode

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen-turbo

Functionally equivalent to qwen-turbo-2025-04-28.
Part of the Qwen3 series.

Stable

Thinking mode

131,072

Non-thinking mode

1,000,000

Thinking mode

98,304

Non-thinking mode

1,000,000

16,384

The maximum length for Chain-of-Thought (CoT) is 38,912 tokens.

$0.044

Thinking mode

$0.431

Non-thinking mode

$0.087

qwen-turbo-latest

Functionally equivalent to the latest snapshot version.
Part of the Qwen3 series.

Latest

qwen-turbo-2025-07-15

Also known as qwen-turbo-0715.
Part of the Qwen3 series.

Snapshot

qwen-turbo-2025-04-28

Also known as qwen-turbo-0428.
Part of the Qwen3 series.

These models support thinking and non-thinking modes. You can switch between the two modes using the enable_thinking parameter. In addition, the model's capabilities have been significantly enhanced:

  1. Inference capability: In evaluations of math, code, and logical reasoning, it significantly surpasses QwQ and non-reasoning models of the same size, which reaches the top tier in the industry for its scale.

  2. Human preference capability: Creative writing, role-play, multi-turn conversation, and instruction-following capabilities have all been greatly improved. Its general capabilities significantly exceed those of models of the same size.

  3. Agent capability: It achieves industry-leading levels in both thinking and non-thinking modes and can accurately call external tools.

  4. Multilingual capability: It supports over 100 languages and dialects, with significant improvements in multilingual translation, instruction understanding, and common-sense reasoning.

  5. Response format: This version fixes response format issues from previous versions, such as abnormal Markdown, intermediate truncation, and incorrect boxed output.

For these models, if you enable thinking mode but no thought process is output, you are charged at the non-thinking mode rate.

More historical snapshot models

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen-turbo-2025-02-11

Alias: qwen-turbo-0211

Snapshot

1,000,000

1,000,000

8,192

$0.044

$0.087

qwen-turbo-2024-11-01

Alias: qwen-turbo-1101

qwen-turbo-2024-09-19

Alias: qwen-turbo-0919

131,072

129,024

QwQ

The QwQ reasoning model, trained on the Qwen2.5 model, significantly improves model inference capabilities through reinforcement learning. Core metrics such as math and code (AIME 24/25, LiveCodeBench) and some general metrics (IFEval, LiveBench) reach the level of the full-power DeepSeek-R1. Usage

Model

Version

Context window

Maximum input

Maximum CoT

Maximum response

Input price

Output price

(Tokens)

(Million tokens)

qwq-plus

Provides the same capabilities as qwq-plus-2025-03-05.

Stable

131,072

98,304

32,768

8,192

$0.230

$0.574

qwq-plus-latest

Always points to the latest snapshot.

Latest

qwq-plus-2025-03-05

Also known as qwq-plus-0305.

Snapshot

Qwen-Long

This is the model in the Qwen series with the longest context window. It offers balanced capabilities and a low cost. It is ideal for tasks such as long-text analysis, information extraction, summarization, and classification and tagging. Usage | Try it online

Model name

Version

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen-long-latest

Same capabilities as the latest snapshot version.

Stable

10,000,000

10,000,000

8,192

$0.072

$0.287

qwen-long-2025-01-25

Also known as qwen-long-0125.

Snapshot

Qwen Omni

This is a new multimodal understanding and generation large model from Qwen. It supports text, image, speech, and video input, and outputs text and audio. It provides four natural conversational voices. Usage | API reference

Model

Version

Context window

Maximum input

Maximum output

(Tokens)

qwen-omni-turbo

Offers the same capabilities as the qwen-omni-turbo-2025-03-26 snapshot.

Stable

32,768

30,720

2,048

qwen-omni-turbo-latest

Offers the same capabilities as the latest snapshot.

Latest

qwen-omni-turbo-2025-03-26

Also known as qwen-omni-turbo-0326.

Snapshot

qwen-omni-turbo-2025-01-19

Also known as qwen-omni-turbo-0119.

The billing rules for input and output are as follows:

Input billable item

Unit price (Million tokens)

Text

$0.058

Audio

$3.584

Image/Video

$0.216

Output billable item

Unit price (Million tokens)

Text

$0.230 (if the input contains only text)

$0.646 (if the input contains images, audio, or video)

Text and audio

$7.168 (for audio)

Text output is not billed.

Billing example: If a request includes 1,000 text tokens and 1,000 image tokens in the input, and generates 1,000 text tokens and 1,000 audio tokens in the output, the total cost is: $0.000058 (text input) + $0.000216 (image input) + $0.007168 (audio output)

Qwen Omni-Realtime

Compared to Qwen-Omni, this model supports streaming audio input and has a built-in Voice Activity Detection (VAD) feature to automatically detect the start and end of user speech. Usage

Model

Version

Context window

Maximum input

Maximum Output

(Tokens)

qwen-omni-turbo-realtime

Offers the same capabilities as the qwen-omni-turbo-2025-05-08 snapshot.

Stable

32,768

30,720

2,048

qwen-omni-turbo-realtime-latest

This model is an alias for the latest snapshot.

Latest

qwen-omni-turbo-realtime-2025-05-08

Snapshot

The billing rules for input and output are as follows:

Enter a billing item

Unit price (Million tokens)

Text

$0.230

Audio

$3.584

Image

$0.861

Billable Outputs

Unit price (Million tokens)

Text

$0.918 (for text-only input)

$2.581 (for inputs with images or audio)

Text and audio

$7.168 (for audio)

The text output is not billed.

QVQ

QVQ is a visual reasoning model that supports visual input and chain-of-thought output. It demonstrates stronger capabilities in math, programming, visual analysis, creation, and general tasks. Usage | Try it online

Model

Version

Context window

Maximum input

Maximum CoT

Maximum response

Input price

Output price

(Tokens)

(Million tokens)

qvq-max

This model provides stronger visual reasoning and instruction-following capabilities than qvq-plus and delivers optimal performance for complex tasks.
This model has the same capabilities as qvq-max-2025-03-25.

Stable

131,072

106,496

Maximum of 16,384 per image.

16,384

8,192

$1.147

$4.588

qvq-max-latest

This model always provides the same capabilities as the latest snapshot.

Latest

qvq-max-2025-05-15

Also known as qvq-max-0515.

Snapshot

qvq-max-2025-03-25

Also known as qvq-max-0325.

qvq-plus

This model has the same capabilities as qvq-plus-2025-05-15.

Stable

$0.287

$0.717

qvq-plus-latest

This model always provides the same capabilities as the latest snapshot.

Latest

qvq-plus-2025-05-15

Also known as qvq-plus-0515.

Snapshot

Qwen-VL

Qwen-VL is a text generation model with visual (image) understanding capabilities. It comes in two series: Qwen-VL-MAX and Qwen-VL-PLUS. It can perform Optical Character Recognition (OCR) and also summarize and reason. For example, it can extract attributes from product photos or solve problems based on exercise diagrams. Usage | API reference | Try it online

Qwen-VL models are billed based on the total number of input and output tokens.
Image token calculation rule: Visual understanding.

Qwen3-VL-Plus

Model

Version

Mode

Context window

Maximum input

Maximum CoT

Maximum output

Input price

Output price

Free quota

(Note)

(Tokens)

(1,000 tokens)

qwen3-vl-plus

Currently has the same capabilities as qwen3-vl-plus-2025-09-23

Stable

Thinking

262,144

258,048

Max 16,384 per image

81,920

32,768

Tiered pricing. For more information, see the notes below the table.

No free quota

Non-thinking

262,144

260,096

Max 16,384 per image

-

qwen3-vl-plus-2025-09-23

Snapshot

Thinking

262,144

258,048

Max 16,384 per image

81,920

32,768

Non-thinking

262,144

260,096

Max 16,384 per image

-

The qwen3-vl-plus and qwen3-vl-plus-2025-09-23 models use a tiered billing method based on the number of input tokens in each request. The tier ranges are left-open and right-closed. The input and output prices are the same for both the thinking and non-thinking modes.

Number of input tokens

Input price (Million tokens)

Output price (Million tokens)

0 to 32K

$0.143353

$1.433525

32K to 128K

$0.215029

$2.150288

128K to 256K

$0.430058

$4.300576

Qwen-VL-Max

This is the most powerful model in the Qwen-VL series. The following models belong to the Qwen2.5-VL series.

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen-vl-max

Offers further improvements in visual reasoning and instruction following capabilities compared to qwen-vl-plus, delivering optimal performance on more complex tasks.
Currently has the same capabilities as qwen-vl-max-2025-08-13

Stable

131,072

129,024

Max 16,384 per image

8,192

$0.23

$0.574

qwen-vl-max-latest

Always has the same capabilities as the latest snapshot

Latest

qwen-vl-max-2025-08-13

Also known as qwen-vl-max-0813
Features comprehensive improvements in visual understanding metrics, with significantly enhanced capabilities in mathematics, reasoning, object detection, and multilingual processing.

Snapshot

qwen-vl-max-2025-04-08

Also known as qwen-vl-max-0408
Enhanced mathematics and reasoning capabilities

$0.431

$1.291

qwen-vl-max-2025-04-02

Also known as qwen-vl-max-0402
Significantly improves accuracy in solving complex mathematical problems

qwen-vl-max-2025-01-25

Also known as qwen-vl-max-0125

Upgraded to the Qwen2.5-VL series. The context is extended to 128k, and the image and video understanding capabilities are significantly enhanced

Qwen-VL-Plus

The Qwen-VL-Plus model offers a balance between performance and cost. The following models belong to the Qwen2.5-VL series.

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen-vl-plus

Currently has the same capabilities as qwen-vl-plus-2025-08-15

Stable

131,072

129,024

Max 16,384 per image

8,192

$0.115

$0.287

qwen-vl-plus-latest

Always has the same capabilities as the latest snapshot

Latest

qwen-vl-plus-2025-08-15

Also known as qwen-vl-plus-0815
Significantly improved capabilities in object detection and localization, and multilingual processing

Snapshot

qwen-vl-plus-2025-07-10

Also known as qwen-vl-plus-0710
Further improves the ability to understand content from monitoring videos

32,768

30,720

Max 16,384 per image

$0.022

$0.216

qwen-vl-plus-2025-05-07

Also known as qwen-vl-plus-0507
Significantly improves the ability to understand mathematics, reasoning, and content from monitoring videos

131,072

129,024

Max 16,384 per image

$0.216

$0.646

qwen-vl-plus-2025-01-25

Also known as qwen-vl-plus-0125

Upgraded to the Qwen2.5-VL series. The context is extended to 128k, and the image and video understanding capabilities are significantly enhanced

Historical snapshots

Qwen-VL-Max

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen-vl-max-2024-12-30

Also known as qwen-vl-max-1230

Snapshot

32,768

30,720

Max 16,384 per image

2,048

$0.431

$1.291

qwen-vl-max-2024-11-19

Also known as qwen-vl-max-1119

qwen-vl-max-2024-10-30

Also known as qwen-vl-max-1030

$2.868

qwen-vl-max-2024-08-09

Also known as qwen-vl-max-0809

Qwen-VL-Plus

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen-vl-plus-2025-01-02

Also known as qwen-vl-plus-0102

Snapshot

32,768

30,720

Max 16,384 per image

2,048

$0.216

$0.646

qwen-vl-plus-2024-08-09

Also known as qwen-vl-plus-0809

Qwen-OCR

Qwen-OCR is a specialized model for text extraction. Compared with the Qwen-VL model, Qwen-OCR is more suitable for extracting text from images of documents, tables, test questions, handwritten notes, and other sources. It recognizes multiple languages, such as English, French, Japanese, Korean, German, Russian, and Italian. Usage | API reference | Try it online

Model

Version

Context window

Maximum input

Maximum output

Unit price for input and output

(Tokens)

(Million tokens)

qwen-vl-ocr

Provides the same capabilities as qwen-vl-ocr-2025-04-13.

Stable

34,096

30,000

A maximum of 30,000 tokens per image.

4,096

$0.717

qwen-vl-ocr-latest

Always provides the same capabilities as the latest snapshot.

Latest

qwen-vl-ocr-2025-04-13

Also known as qwen-vl-ocr-0413.
Provides significantly improved text recognition, six built-in OCR tasks, and features such as custom prompts and image rotation correction.

Snapshot

qwen-vl-ocr-2024-10-28

Also known as qwen-vl-ocr-1028.

Snapshot

Qwen-Math

The Qwen-Math model is a language model specialized for solving mathematical problems. Usage | API reference | Try it online

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen-math-plus

Equivalent to qwen-math-plus-2024-09-19.

Stable

4,096

3,072

3,072

$0.574

$1.721

qwen-math-plus-latest

Equivalent to the latest snapshot.

Latest

qwen-math-plus-2024-09-19

Also known as qwen-math-plus-0919.

Snapshot

qwen-math-plus-2024-08-16

Also known as qwen-math-plus-0816.

qwen-math-turbo

Equivalent to qwen-math-turbo-2024-09-19.

Stable

$0.287

$0.861

qwen-math-turbo-latest

Equivalent to the latest snapshot.

Latest

qwen-math-turbo-2024-09-19

Also known as qwen-math-turbo-0919.

Snapshot

Qwen-Coder

This is the Qwen code model. The latest Qwen3-Coder-Plus series model is a code generation model based on Qwen3. It has powerful coding agent capabilities, excels at tool calling and environment interaction, and can perform autonomous programming. It combines excellent coding skills with general-purpose abilities. Usage | API reference | Try it online

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen3-coder-plus

Currently has the same capabilities as qwen3-coder-plus-2025-07-22

Stable

1,000,000

997,952

65,536

Tiered pricing applies, see the description below the table.

qwen3-coder-plus-2025-09-23

Snapshot

qwen3-coder-plus-2025-07-22

Snapshot

qwen3-coder-flash

Currently has the same capabilities as qwen3-coder-flash-2025-07-28

Stable

qwen3-coder-flash-2025-07-28

Snapshot

These models use tiered pricing based on the number of input tokens per request (left-open, right-closed intervals).

qwen3-coder-plus series

The prices for qwen3-coder-plus, qwen3-coder-plus-2025-09-23, and qwen3-coder-plus-2025-07-22 are as follows. qwen3-coder-plus supports context cache.

Number of input tokens

Input price (Million tokens)

Output price (Million tokens)

0–32K

$0.574

$2.294

32K–128K

$0.861

$3.441

128K–256K

$1.434

$5.735

256K–1M

$2.868

$28.671

qwen3-coder-flash series

The prices for qwen3-coder-flash and qwen3-coder-flash-2025-07-28 are as follows. qwen3-coder-flash supports context cache.

Number of input tokens

Input price (Million tokens)

Output price (Million tokens)

0–32K

$0.144

$0.574

32K–128K

$0.216

$0.861

128K–256K

$0.359

$1.434

256K–1M

$0.717

$3.584

Previous versions

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen-coder-plus

Currently has the same capabilities as qwen-coder-plus-2024-11-06

Stable

131,072

129,024

8,192

$0.502

$1.004

qwen-coder-plus-latest

Has the same capabilities as the latest snapshot version of qwen-coder-plus

Latest

qwen-coder-plus-2024-11-06

Also known as qwen-coder-plus-1106

Snapshot

qwen-coder-turbo

Currently has the same capabilities as qwen-coder-turbo-2024-09-19

Stable

131,072

129,024

8,192

$0.287

$0.861

qwen-coder-turbo-latest

Has the same capabilities as the latest snapshot version of qwen-coder-turbo

Latest

qwen-coder-turbo-2024-09-19

Also known as qwen-coder-turbo-0919

Snapshot

Qwen-MT

This flagship large translation model is a comprehensive upgrade of Qwen 3. It supports translation between 92 languages, including Chinese, English, Japanese, Korean, French, Spanish, German, Thai, Indonesian, Vietnamese, and Arabic. With significantly improved performance and translation quality, the model provides more stable term customization, format retention, and domain-specific prompting capabilities for more accurate and natural translations. Usage | Try online

Model

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen-mt-plus

Part of Qwen3-MT

16,384

8,192

8,192

$0.259

$0.775

qwen-mt-turbo

Part of Qwen3-MT

$0.101

$0.280

Qwen-ASR

Based on the Qwen multimodal model, Qwen-ASR supports multilingual recognition, singing recognition, customized speech recognition, and noise rejection. Usage

Model

Version

Supported languages

Supported sample rate

Unit price

qwen3-asr-flash

Offers the same capabilities as qwen3-asr-flash-2025-09-08.

Stable

Chinese (Mandarin, Sichuanese, Min Nan, Wu, and Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, and Spanish

16 kHz

$0.000032/second

qwen3-asr-flash-2025-09-08

Snapshot

Qwen data mining model

The Qwen data mining model extracts structured information from documents for applications such as data annotation and content moderation. UsageAPI reference

Model

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen-doc-turbo

131,072

129,024

8,192

$0.087

$0.144

Qwen deep research model

The Qwen deep research model breaks down complex problems, performs inference and analysis using web searches, and generates research reports. Usage | API reference | Try online

Model

Context window

Maximum input

Maximum output

Input price

Output price

Free quota

(Tokens)

(Thousand tokens)

qwen-deep-research

1,000,000

997,952

32,768

$0.007742

$0.023367

No free quota

Text generation - Qwen - Open-source

  • In model names, `xxb` indicates the number of parameters. For example, `qwen2-72b-instruct` has 72 billion (72B) parameters.

  • Model Studio supports calls to the open-source editions of Qwen, so you do not need to deploy the models locally. For open-source editions, we recommend using the Qwen3 or Qwen2.5 models.

Qwen3

The qwen3-next-80b-a3b-thinking model, released in September 2025, supports only thinking mode. It features improved instruction-following capabilities compared to the qwen3-235b-a22b-thinking-2507 model, resulting in more concise summary responses.

The qwen3-next-80b-a3b-instruct model, released in September 2025, supports only non-thinking mode. It offers enhanced Chinese understanding, logical reasoning, and text generation capabilities compared to qwen3-235b-a22b-instruct-2507.

The qwen3-235b-a22b-thinking-2507 and qwen3-30b-a3b-thinking-2507 models, released in July 2025 and supporting only the thinking mode, are upgrades to the thinking mode of the qwen3-235b-a22b and qwen3-30b-a3b models.

The qwen3-235b-a22b-instruct-2507 and qwen3-30b-a3b-instruct-2507 models, released in July 2025 and supporting only the non-thinking mode, are upgrades to the non-thinking mode of the qwen3-235b-a22b and qwen3-30b-a3b models.

The Qwen3 model, released in April 2025, supports thinking and non-thinking modes. You can switch between the two modes using the enable_thinking parameter. In addition, the Qwen3 model features significant enhancements to its capabilities:

  1. Inference capability: In evaluations of math, code, and logical reasoning, it significantly surpasses QwQ and other models of the same size, which reaches the top tier in the industry for its scale.

  2. Human preference capability: Its capabilities for creative writing, role-play, multi-turn conversation, and instruction following have been greatly improved. Its general capabilities significantly exceed those of other models of the same size.

  3. Agent capability: It achieves industry-leading levels in both thinking and non-thinking modes and can accurately call external tools.

  4. Multilingual capability: It supports over 100 languages and dialects, with significant improvements in multilingual translation, instruction understanding, and common-sense reasoning.

    Supported languages

    English

    Simplified Chinese

    Traditional Chinese

    French

    Spanish

    Arabic uses the Arabic alphabet and is an official language in many Arab countries.

    Russian uses the Cyrillic alphabet and is an official language in Russia and some other countries.

    Portuguese uses the Latin alphabet and is the official language of Portugal, Brazil, and other Portuguese-speaking countries.

    German uses the Latin alphabet and is the official language in countries such as Germany and Austria.

    Italian uses the Latin alphabet and is the official language in Italy, San Marino, and parts of Switzerland.

    Dutch uses the Latin alphabet and is an official language in the Netherlands, parts of Belgium (the Flemish Region), and Suriname.

    Danish uses the Latin alphabet and is the official language of Denmark.

    Irish uses the Latin alphabet and is one of the official languages of Ireland.

    Welsh uses the Latin alphabet and is one of the official languages of Wales.

    Finnish uses the Latin alphabet and is an official language of Finland.

    Icelandic uses the Latin alphabet and is the official language of Iceland.

    Swedish uses the Latin alphabet and is the official language of Sweden.

    Norwegian Nynorsk uses the Latin alphabet, is an official language of Norway, and is used alongside Bokmål.

    Norwegian Bokmål uses the Latin alphabet and is a major written language in Norway.

    Japanese uses Japanese characters and is the official language of Japan.

    Korean uses the Hangul script and is the official language of South Korea and North Korea.

    Vietnamese uses the Latin alphabet and is the official language of Vietnam.

    Thai uses the Thai alphabet and is the official language of Thailand.

    Indonesian uses the Latin alphabet and is the official language of Indonesia.

    Malay uses the Latin alphabet and is the primary language of Malaysia and other regions.

    Burmese uses the Burmese alphabet and is the official language of Myanmar.

    Tagalog uses the Latin alphabet and is one of the main languages of the Philippines.

    Khmer uses the Khmer script and is the official language of Cambodia.

    Lao uses the Lao script and is the official language of Laos.

    Hindi uses the Devanagari script and is one of the official languages of India.

    Bengali uses the Bengali script and is the official language of Bangladesh and the Indian state of West Bengal.

    Urdu uses the Arabic script, is one of the official languages of Pakistan, and is also spoken in India.

    Nepali uses the Devanagari script and is the official language of Nepal.

    Hebrew uses the Hebrew alphabet and is the official language of Israel.

    Turkish uses the Latin alphabet and is the official language of Türkiye and Northern Cyprus.

    Persian uses the Arabic script and is the official language in countries such as Iran and Tajikistan.

    Polish uses the Latin alphabet and is the official language of Poland.

    Ukrainian uses the Cyrillic alphabet and is the official language of Ukraine.

    Czech uses the Latin alphabet and is the official language of the Czech Republic.

    Romanian uses the Latin alphabet and is the official language of Romania and Moldova.

    Bulgarian uses the Cyrillic alphabet and is the official language of Bulgaria.

    Slovak uses the Latin alphabet and is the official language of Slovakia.

    Hungarian uses the Latin alphabet and is the official language of Hungary.

    Slovenian uses the Latin alphabet and is the official language of Slovenia.

    Latvian uses the Latin alphabet and is the official language of Latvia.

    Estonian uses the Latin alphabet and is the official language of Estonia.

    Lithuanian uses the Latin alphabet and is the official language of Lithuania.

    Belarusian uses the Cyrillic alphabet and is one of the official languages of Belarus.

    Greek uses the Greek alphabet and is the official language of Greece and Cyprus.

    Croatian uses the Latin alphabet and is the official language of Croatia.

    Macedonian uses the Cyrillic alphabet and is the official language of North Macedonia.

    Maltese uses the Latin alphabet and is an official language of Malta.

    Serbian uses the Cyrillic alphabet and is the official language of Serbia.

    Bosnian uses the Latin alphabet and is an official language of Bosnia and Herzegovina.

    Georgian uses the Georgian script and is the official language of Georgia.

    Armenian uses the Armenian alphabet and is the official language of Armenia.

    North Azerbaijani uses the Latin alphabet and is the official language of Azerbaijan.

    Kazakh uses the Cyrillic alphabet and is the official language of Kazakhstan.

    Northern Uzbek uses the Latin alphabet and is the official language of Uzbekistan.

    Tajik uses the Cyrillic alphabet and is the official language of Tajikistan.

    Swahili uses the Latin alphabet and is a lingua franca or an official language in many East African countries.

    Afrikaans uses the Latin alphabet and is spoken mainly in South Africa and Namibia.

    Cantonese uses Traditional Chinese characters and is a major language in China's Guangdong province, and in Hong Kong and Macau.

    Luxembourgish uses the Latin alphabet, is one of the official languages of Luxembourg, and is also spoken in parts of Germany.

    Limburgish uses the Latin alphabet and is spoken mainly in the Netherlands, Belgium, and parts of Germany.

    Catalan uses the Latin alphabet and is spoken in Catalonia and other parts of Spain.

    Galician uses the Latin alphabet and is spoken mainly in the Galicia region of Spain.

    Asturian uses the Latin alphabet and is spoken mainly in the Asturias region of Spain.

    Basque uses the Latin alphabet, is spoken mainly in the Basque Country of Spain and France, and is an official language of the Basque Autonomous Community in Spain.

    Occitan uses the Latin alphabet and is spoken mainly in southern France.

    Venetian uses the Latin alphabet and is spoken mainly in the Veneto region of Italy.

    Sardinian uses the Latin alphabet and is spoken mainly on the island of Sardinia, Italy.

    Sicilian uses the Latin alphabet and is spoken mainly on the island of Sicily, Italy.

    Friulian uses the Latin alphabet and is spoken mainly in the Friuli-Venezia Giulia region of Italy.

    Lombard uses the Latin alphabet and is spoken mainly in the Lombardy region of Italy.

    Ligurian uses the Latin alphabet and is spoken mainly in the Liguria region of Italy.

    Faroese uses the Latin alphabet, is an official language of the Faroe Islands, and is spoken mainly there.

    Tosk Albanian uses the Latin alphabet and is the primary dialect of southern Albania.

    Silesian uses the Latin alphabet and is spoken mainly in Poland.

    Bashkir uses the Cyrillic alphabet and is spoken mainly in the Republic of Bashkortostan, Russia.

    Tatar uses the Cyrillic alphabet and is spoken mainly in the Republic of Tatarstan, Russia.

    Mesopotamian Arabic uses the Arabic script and is spoken mainly in Iraq.

    Najdi Arabic uses the Arabic script and is spoken mainly in the Najd region of Saudi Arabia.

    Egyptian Arabic uses the Arabic script and is spoken mainly in Egypt.

    Levantine Arabic uses the Arabic script and is spoken mainly in Syria and Lebanon.

    Ta'izzi-Adeni Arabic uses the Arabic script and is spoken mainly in Yemen and the Hadhramaut region of Saudi Arabia.

    Dari uses the Arabic script and is one of the official languages of Afghanistan.

    Tunisian Arabic uses the Arabic script and is spoken mainly in Tunisia.

    Moroccan Arabic uses the Arabic script and is spoken mainly in Morocco.

    Kabuverdianu Creole uses the Latin alphabet and is spoken mainly in Cape Verde.

    Tok Pisin uses the Latin alphabet and is one of the official languages of Papua New Guinea.

    Yiddish (Eastern Yiddish) uses the Hebrew alphabet and is spoken mainly by Jewish communities.

    Sindhi uses the Arabic script and is an official language of the Sindh province of Pakistan.

    Sinhala uses the Sinhala alphabet and is one of the official languages of Sri Lanka.

    Telugu uses the Telugu script and is an official language of the Indian states of Andhra Pradesh and Telangana.

    Punjabi uses the Gurmukhi script, is spoken in the Indian state of Punjab, and is one of the official languages of India.

    Tamil uses the Tamil script and is an official language of the Indian state of Tamil Nadu and of Sri Lanka.

    Gujarati uses the Gujarati script and is an official language of the Indian state of Gujarat.

    Malayalam uses the Malayalam script and is an official language of the Indian state of Kerala.

    Marathi uses the Devanagari script and is an official language of the Indian state of Maharashtra.

    Kannada uses the Kannada script and is an official language of the Indian state of Karnataka.

    Magahi uses the Devanagari script and is spoken mainly in the Indian state of Bihar.

    Oriya uses the Odia script and is an official language of the Indian state of Odisha.

    Awadhi uses the Devanagari script and is spoken mainly in the Indian state of Uttar Pradesh.

    Maithili uses the Devanagari script, is spoken in the Indian state of Bihar and the Terai region of Nepal, and is one of the official languages of India.

    Assamese uses the Bengali script and is an official language of the Indian state of Assam.

    Chhattisgarhi uses the Devanagari script and is spoken mainly in the Indian state of Chhattisgarh.

    Bhojpuri uses the Devanagari script and is spoken in parts of India and Nepal.

    Minangkabau uses the Latin alphabet and is spoken mainly on the island of Sumatra in Indonesia.

    Balinese uses the Latin alphabet and is spoken mainly on the island of Bali in Indonesia.

    Javanese uses the Latin alphabet, although its traditional script is also commonly used. It is widely spoken on the island of Java in Indonesia.

    Banjar uses the Latin alphabet and is spoken mainly on the island of Kalimantan in Indonesia.

    Sundanese uses the Latin alphabet and its traditional script, and is spoken mainly in western Java, Indonesia.

    Cebuano uses the Latin alphabet and is spoken mainly in the Cebu region of the Philippines.

    Pangasinan uses the Latin alphabet and is spoken mainly in the Pangasinan province of the Philippines.

    Iloko uses the Latin alphabet and is spoken mainly in the Philippines.

    Waray (Philippines) uses the Latin alphabet and is spoken mainly in the Philippines.

    Haitian Creole uses the Latin alphabet and is one of the official languages of Haiti.

    Papiamento uses the Latin alphabet and is spoken mainly on Caribbean islands such as Aruba and Curaçao.

  5. Response format: Addresses response format issues from previous versions, such as malformed Markdown, truncated responses, and incorrect `boxed` output.

The Qwen3 open-source model, scheduled for release in April 2025, supports only streaming output in thinking mode.

Thinking Mode | Non-thinking Mode | API Reference

Model

Mode

Context window

Maximum input

Maximum chain-of-thought

Maximum response

Input price

Output price

(Tokens)

(Million tokens)

qwen3-next-80b-a3b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.144

$1.434

qwen3-next-80b-a3b-instruct

Non-thinking only

129,024

-

$0.574

qwen3-235b-a22b-thinking-2507

Thinking only

126,976

81,920

$0.287

$2.868

qwen3-235b-a22b-instruct-2507

Non-thinking only

129,024

-

$1.147

qwen3-30b-a3b-thinking-2507

Thinking only

126,976

81,920

$0.108

$1.076

qwen3-30b-a3b-instruct-2507

Non-thinking only

129,024

-

$0.431

qwen3-235b-a22b

Non-thinking

129,024

-

16,384

$0.287

$1.147

Thinking

98,304

38,912

$2.868

qwen3-32b

Non-thinking

129,024

-

$0.287

$1.147

Thinking

98,304

38,912

$2.868

qwen3-30b-a3b

Non-thinking

129,024

-

$0.108

$0.431

Thinking

98,304

38,912

$1.076

qwen3-14b

Non-thinking

129,024

-

8,192

$0.144

$0.574

Thinking

98,304

38,912

$1.434

qwen3-8b

Non-thinking

129,024

-

$0.072

$0.287

Thinking

98,304

38,912

$0.717

qwen3-4b

Non-thinking

129,024

-

$0.044

$0.173

Thinking

98,304

38,912

$0.431

qwen3-1.7b

Non-thinking

32,768

30,720

-

$0.173

Thinking

28,672

The sum of this value and the input must not exceed 30,720.

$0.431

qwen3-0.6b

Non-thinking

30,720

-

$0.173

Thinking

28,672

The combined value of this item and the input cannot exceed 30,720.

$0.431

For the Qwen3 model, if thinking mode is enabled but no thinking process is generated, you are charged the non-thinking mode price.

QwQ-Open source

The QwQ reasoning model is trained on the Qwen2.5-32B model and uses reinforcement learning to significantly improve its inference capabilities. The model's performance matches that of the full version of DeepSeek-R1 on core math and code metrics, such as AIME 24/25 and LiveCodeBench, and on general metrics, such as IFEval and LiveBench. Its performance on all metrics significantly exceeds that of DeepSeek-R1-Distill-Qwen-32B, another model based on the Qwen2.5-32B model. Usage | API reference

Model

Context window

Maximum input

Maximum chain-of-thought

Maximum response

Input price

Output price

(Tokens)

(Million tokens)

qwq-32b

131,072

98,304

32,768

8,192

$0.287

$0.861

QwQ-Preview

The qwq-32b-preview model is an experimental model developed by the Qwen team in 2024. It is designed to enhance AI inference capabilities, particularly in mathematics and programming. For information about the model's limitations, see the official QwQ blog. Usage | API Reference | Try online

Model

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwq-32b-preview

32,768

30,720

16,384

$0.287

$0.861

Qwen2.5

Qwen2.5 is a series of Qwen large language models that includes base and instruction-tuned models with parameter sizes ranging from 500 million to 72 billion. Qwen2.5 offers the following improvements over Qwen2:

  • It is pre-trained on a large-scale dataset of up to 18 trillion tokens.

  • It has a significantly expanded knowledge base and greatly improved encoding and math abilities.

  • It has significant improvements in following instructions, generating long text (over 8K tokens), understanding structured data such as tables, and generating structured outputs, especially JSON. The model is more resilient to diverse system prompts, which enhances chatbot role-play and conditional settings.

  • It supports over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.

Usage | API reference | Try online

Model name

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen2.5-14b-instruct-1m

1,000,000

1,000,000

8,192

$0.144

$0.431

qwen2.5-7b-instruct-1m

$0.072

$0.144

qwen2.5-72b-instruct

131,072

129,024

$0.574

$1.721

qwen2.5-32b-instruct

$0.287

$0.861

qwen2.5-14b-instruct

$0.144

$0.431

qwen2.5-7b-instruct

$0.072

$0.144

qwen2.5-3b-instruct

32,768

30,720

$0.044

$0.130

qwen2.5-1.5b-instruct

Free for a limited time

qwen2.5-0.5b-instruct

QVQ

The qvq-72b-preview model is an experimental model from the Qwen team that focuses on improving visual reasoning, particularly for mathematical inference. For more information about the model's limitations, see the official QVQ blog. Usage | API reference

To have the model output its thinking process before the final answer, you can use the QVQ commercial model.

Model

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qvq-72b-preview

32,768

16,384

Maximum 16,384 for a single image

16,384

$1.721

$5.161

Qwen-Omni

Qwen-Omni is a large multimodal model trained on Qwen2.5. It understands text, image, audio, and video inputs. The model can simultaneously stream text and audio outputs and provides significantly faster multimodal content understanding. Usage | API Reference

Model

Context window

Maximum input

Maximum output

Tokens

qwen2.5-omni-7b

32,768

30,720

2,048

Billing for inputs and outputs is as follows:

Billing Item

Price (Million tokens)

Text

$0.087

Audio

$5.448

Image or video

$0.287

Output

Price (Million tokens)

Text

$0.345 (for text-only input)

$0.861 (for input that includes images, audio, or video)

Text and audio

$10.895 (for audio)

Text output is not billed.

Billing example: If a request has an input of 1,000 text tokens and 1,000 image tokens, and an output of 1,000 text tokens and 1,000 audio tokens, the total cost is $0.000087 (text input) + $0.000287 (image input) + $0.010895 (audio output).

Qwen-VL

This is the open-source version of Alibaba Cloud's Qwen-VL. Usage | API reference

The Qwen3-VL model offers significant improvements over Qwen2.5-VL:

  • Agent interaction: It operates computer or mobile phone interfaces, detects GUI elements, understands features, and invokes tools to perform tasks. It achieves top-tier performance in evaluations such as OS World.

  • Visual encoding: It generates code from images or videos. This feature can be used to create HTML, CSS, and JS code from design drafts or website screenshots.

  • Spatial intelligence: It supports 2D and 3D positioning and accurately determines object orientation, perspective changes, and occlusion relationships.

  • Long video understanding: It understands video content up to 20 minutes long and can pinpoint specific moments with second-level accuracy.

  • Deep thinking: It excels at capturing details and analyzing causality, achieving top-tier performance in evaluations such as MathVista and MMMU.

  • OCR: It supports 33 languages and performs more stably in scenarios with complex lighting, blur, or tilt. It also significantly improves the accuracy of recognizing rare characters, ancient script, and technical terms.

    Supported languages

    The model supports the following 33 languages: Chinese, Japanese, Korean, Indonesian, Vietnamese, Thai, English, French, German, Russian, Portuguese, Spanish, Italian, Swedish, Danish, Czech, Norwegian, Dutch, Finnish, Türkiye, Polish, Swahili, Romanian, Serbian, Greek, Kazakh, Uzbek, Cebuano, Arabic, Urdu, Persian, Hindi/Devanagari, and Hebrew.

Qwen3-VL

Model

Mode

Context window

Maximum input

Maximum CoT

Maximum response

Input price

Output price

CoT + responses

Free quota

(Note)

(Tokens)

(1,000 tokens)

qwen3-vl-30b-a3b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.108

$1.076

No free quota

qwen3-vl-30b-a3b-instruct

Non-thinking only

129,024

-

$0.431

qwen3-vl-235b-a22b-thinking

Thinking only

131,072

126,976

81,920

32,768

$0.286705

$2.867051

qwen3-vl-235b-a22b-instruct

Non-thinking only

129,024

-

$1.146820

Qwen2.5-VL

Model

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen2.5-vl-72b-instruct 

131,072

129,024

Max 16,384 per image

8,192

$2.294

$6.881

qwen2.5-vl-32b-instruct

$1.147

$3.441

qwen2.5-vl-7b-instruct

$0.287

$0.717

qwen2.5-vl-3b-instruct

$0.173

$0.517

qwen2-vl-72b-instruct

32,768

30,720

Max 16,384 per image

2,048

$2.294

$6.881

qwen2-vl-7b-instruct

32,000

30,000

Max 16,384 per image

2,000

Free for a limited time

qwen2-vl-2b-instruct

Qwen-Math

Qwen2.5-Math, a language model based on Qwen, is designed to solve math problems. It supports Chinese and English and integrates multiple inference methods, such as Chain of Thought (CoT), Program of Thought (PoT), and Tool-Integrated Reasoning (TIR). Usage | API reference | Try online

Model name

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen2.5-math-72b-instruct

4,096

3,072

3,072

$0.574

$1.721

qwen2.5-math-7b-instruct

$0.144

$0.287

qwen2.5-math-1.5b-instruct

Free for a limited time

Qwen-Coder

Qwen-Coder is an open-source code model from Qwen. The latest model, qwen3-coder-480b-a35b-instruct, is a code generation model built on Qwen3. It has powerful agent capabilities for coding and excels at tool calling and environment interaction. The model supports autonomous programming and combines advanced coding skills with general-purpose abilities. Usage | API reference | Try online

Model

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen3-coder-480b-a35b-instruct

262,144

204,800

65,536

Tiered pricing. See the notes below.

qwen3-coder-30b-a3b-instruct

qwen2.5-coder-32b-instruct

131,072

129,024

8,192

$0.287

$0.861

qwen2.5-coder-14b-instruct

qwen2.5-coder-7b-instruct

$0.144

$0.287

qwen2.5-coder-3b-instruct

32,768

30,720

Limited-time free trial

qwen2.5-coder-1.5b-instruct

qwen2.5-coder-0.5b-instruct

Billing for qwen3-coder-480b-a35b-instruct and qwen3-coder-30b-a3b-instruct is tiered based on the number of input tokens per request (left-open, right-closed intervals).

Model

Input tokens

Input price (Million tokens)

Output price (Million tokens)

qwen3-coder-480b-a35b-instruct

0-32K

$0.861

$3.441

32K-128K

$1.291

$5.161

128K-200K

$2.151

$8.602

qwen3-coder-30b-a3b-instruct

0-32K

$0.216

$0.861

32K-128K

$0.323

$1.291

128K-200K

$0.538

$2.151

Text generation - third-party models

DeepSeek

DeepSeek is an LLM series from the DeepSeek company. API reference | Try online

Model

Context window

Maximum input

Maximum CoT

Maximum response

Input price

Output price

(tokens)

(Million tokens)

deepseek-v3.1

A 685B full-parameter model.

65,536

32,768

98,304

131,072

$0.574

$1.721

deepseek-r1

A 685B full-parameter model.

16,384

$2.294

deepseek-r1-0528

A 685B full-parameter model.

deepseek-v3

A 671B full-parameter model.

65,536

57,344

Not applicable

8,192

$0.287

$1.147

deepseek-r1-distill-qwen-1.5b

Based on Qwen2.5-Math-1.5B.

32,768

32,768

16,384

16,384

Limited-time free trial

deepseek-r1-distill-qwen-7b

Based on Qwen2.5-Math-7B.

$0.072

$0.144

deepseek-r1-distill-qwen-14b

Based on Qwen2.5-14B.

$0.144

$0.431

deepseek-r1-distill-qwen-32b

Based on Qwen2.5-32B.

$0.287

$0.861

deepseek-r1-distill-llama-8b

Based on Llama-3.1-8B.

Limited-time free trial

deepseek-r1-distill-llama-70b

Based on Llama-3.3-70B.

Kimi

Kimi-K2, developed by Moonshot AI, is the first open-source trillion-parameter Mixture of Experts (MoE) model from China. It has 32 billion active parameters and excels at encoding and tool calling. Usage | Try online

Model

Context window

Input price

Output price

(Tokens)

(Million tokens)

Moonshot-Kimi-K2-Instruct

131,072

$0.574

$2.294

Image generation

Qwen text-to-image

This model excels at complex text rendering, especially for the Chinese and English languages. Currently, qwen-image-plus and qwen-image have the same capabilties, but qwen-image-plus has lower price. API reference.

Model

Unit price

qwen-image-plus

$0.028671 per image

qwen-image

$0.035 per image

Input prompt

Output image

Healing-style hand-drawn poster featuring three puppies playing with a ball on lush green grass, adorned with decorative elements such as birds and stars. The main title “Come Play Ball!” is prominently displayed at the top in bold, blue cartoon font. Below it, the subtitle “Come [Show Off Your Skills]!” appears in green font. A speech bubble adds playful charm with the text: “Hehe, watch me amaze my little friends next!” At the bottom, supplementary text reads: “We get to play ball with our friends again!” The color palette centers on fresh greens and blues, accented with bright pink and yellow tones to highlight a cheerful, childlike atmosphere.

image

Qwen image editing

The Qwen image editing model offers a wide range of features for advanced image and text editing. You can perform precise text editing in Chinese and English, adjust colors, enhance details, apply style transfers, add or delete objects, and modify positions and actions. API reference.

Model

Price

qwen-image-edit

$0.043 per image

dog_and_girl (1)

Original image

狗修改图

Change the person's pose to bending over and holding the dog's front paws.

image

Original image

image

Replace the words 'HEALTH INSURANCE' on the letter blocks with '明天会更好' (Tomorrow will be better).

5

Original image

5out

Replace the polka-dot shirt with a light blue shirt.

6

Original image

6out

Change the background to Antarctica.

7

Original image

7out

Generate a cartoon profile picture of the person.

image

Original image

image

Remove the hair from the plate.

Qwen image translation

The Qwen image translation model translates text in images from 11 languages into Chinese or English, accurately preserving the original layout and content while offering custom features such as glossary definitions, sensitive word filtering, and image entity detection. API reference.

Model

Unit price

qwen-mt-image

$0.000431/image

en

Original image

ja

Japanese

es

Portuguese

ar

Arabic

Wan Text-to-Image

Text-to-Image V2

The V2 series features advanced text-to-image models to generate images from text. API reference | Online Experience

Model

Description

Unit price

wan2.5-t2i-preview Recommended

Wan 2.5 preview removes the single-side limitation, allowing users to freely select image dimensions within the constraints of total pixel area and aspect ratio.

$0.028671/image

wan2.2-t2i-plus Recommended

Wan 2.2 professional edition. Offers comprehensive upgrades in creativity, stability, and realistic textures.

$0.02007/image

wan2.2-t2i-flash Recommended

Wan 2.2 speed edition. Offers comprehensive upgrades in creativity, stability, and realistic textures.

$0.028671/image

wanx2.1-t2i-plus

Wan 2.1 professional edition. Generates highly detailed images in multiple styles.

$0.028671/image

wanx2.1-t2i-turbo

Wan 2.1 turbo edition. Generates images quickly in multiple styles.

$0.020070/image

wanx2.0-t2i-turbo

Wan 2.0 turbo edition. Excels at creating high-texture portraits and creative designs. This model is highly cost-effective.

$0.005735/image

Scenario 1: Text generation

Prompt: Generate a New Year's greeting card with a snowy background, children setting off firecrackers, a snake forming the number 2025, and the text "HAPPY NEW YEAR".

Comparison: The v2.2 model generates text more effectively and is ideal for creative designs.

wan2.2-t2i-plus

wanx2.1-t2i-plus

wanx2.1-t2i-turbo

wanx2.0-t2i-turbo

image

47ebac80ff34442ab070b1f201c59a45_0

image

image

Scenario 2: Portrait generation

Prompt: Chinese girl, round face, looking at the camera, elegant ethnic clothing, commercial photography, outdoor, cinematic lighting, medium close-up shot, delicate light makeup, sharp edges.

Effect comparison: The 2.2 model offers improved image stability, while the 2.0 model excels at generating high-quality portraits. Both are excellent choices.

wan2.2-t2i-plus

wanx2.1-t2i-plus

wanx2.1-t2i-turbo

wanx2.0-t2i-turbo

image

fca92c863b3b41e6b6569c008e272592_3

image

image

Wan2.5 general image editing

The Wan2.5 general image editing model supports entity-consistent image editing and multi-image fusion. It accepts text, a single image, or multiple images as input. API reference.

模型名称

单价

wan2.5-i2i-preview

$0.028671/张

Feature

Input example

Output image

Single-image editing

damotest2023_Portrait_photography_outdoors_fashionable_beauty_409ae3c1-19e8-4515-8e50-b3c9072e1282_2-转换自-png

Replace the floral dress with a vintage-style lace gown that has delicate embroidery on the collar and cuffs.

a26b226d-f044-4e95-a41c-d1c0d301c30b-转换自-png

Multi-image fusion

图像编辑2图像编辑2

Place the alarm clock from Image 1 next to the vase on the dining table in Image 2.

图像编辑2

Wan 2.1 general image editing

The Wan general image editing model performs diverse image edits based on simple instructions and is suitable for applications such as image outpainting, watermark removal, style transfer, image restoration, and image enhancement .UsageAPI reference

Model

Unit price

wanx2.1-imageedit

$0.020070 per image

General image editing currently supports the following features:

Feature

Input image

Input prompt

Output image

Global stylization

image

Apply a French picture book style.

image

Local stylization

image

Change the house to a wooden-plank style.

image

Instruction-based editing

image

Change the girl's hair to red.

image

Inpainting

Input image

image

Masked image (The white area is the mask)

image

A ceramic rabbit holding a ceramic flower.

Output image

image

Text watermark removal

image

Remove the text from the image.

image

Outpainting

20250319105917

A green fairy.

image

Image super resolution

Low-resolution image

image

Image super resolution.

High-resolution image

image

Image colorization

image

Blue background, yellow leaves.

image

Line art to image

image

A minimalist, Nordic-style living room.

image

Underlay graph

image

A cartoon character cautiously peeking its head out, looking at a brilliant blue gem inside a channel.

image

OutfitAnyone

  • The OutfitAnyone-Plus Edition offers higher image definition, finer clothing texture details, and better logo restoration than the Basic Edition model. However, the longer image generation time makes this edition suitable for scenarios that are not time-sensitive. API reference | Try online

  • The OutfitAnyone-Image Parsing service parses model and clothing images for pre-processing and post-processing. API reference

Model

Description

Sample Input

Sample Output

aitryon-plus

OutfitAnyone-Plus

output26

output29

aitryon-parsing-v1

Image parsing for OutfitAnyone

OutfitAnyone billing

Model Service

Model

Unit Price

Discount

Tier

OutfitAnyone-Plus Edition

aitryon-plus

$0.071677/image

None

None

OutfitAnyone-Image Parsing

aitryon-parsing-v1

$0.000574/image

None

None

Speech synthesis (text-to-speech)

Qwen-TTS

Qwen-TTS, a speech synthesis model from the Qwen series, converts text in Chinese, English, or a mix of both into streaming audio output. Usage | API reference

Model

Version

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen-tts

Functionally equivalent to qwen-tts-2025-04-10.

Stable

8,192

512

7,680

$0.230

$1.434

qwen-tts-latest

Functionally equivalent to the latest snapshot.

Latest

qwen-tts-2025-05-22

Snapshot

qwen-tts-2025-04-10

The audio output is tokenized at a rate of 50 tokens per second. Audio clips shorter than 1 second are also counted as 50 tokens.

CosyVoice

CosyVoice is a next-generation large generative model for speech synthesis developed by Tongyi Lab. Powered by large-scale pre-trained language models, it integrates text understanding with speech generation and supports real-time streaming text-to-speech synthesis. Usage | Try online | Voice list

Model

Unit price

cosyvoice-v3-plus

$0.286706 per 10,000 characters

cosyvoice-v3

$0.0573412 per 10,000 characters

cosyvoice-v2

$0.286706 per 10,000 characters

A Chinese character is counted as two characters, while English letters, punctuation marks, and spaces are each counted as one character.

Speech recognition (speech-to-text) and translation (speech-to-translation)

Paraformer

The Paraformer speech recognition service transcribes only spoken content from audio, and you are billed only for this transcribed content. Therefore, the billable duration is typically shorter than the total length of the audio file. Because the service uses AI to interpret audio, minor transcription errors may occur.
By default, only the first track of a multi-track audio file is transcribed and billed. If you enable multi-track transcription, each track is billed separately based on its duration.
The actual billing duration is specified in the content_duration field of the response.

Audio file recognition

API Reference | Online Demo

Model

Supported languages

Supported sample rates

Scenarios

Supported audio formats

Unit price

paraformer-v2

Mandarin Chinese, Chinese dialects (Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, and Shanghainese), English, Japanese, Korean, German, French, and Russian

Any

ApsaraVideo Live

aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, and wmv

$0.000012/second

paraformer-8k-v2

Mandarin Chinese

8 kHz

Telephony

Real-time speech recognition

API reference | Online Demo

Model

Supported languages

Supported sample rates

Scenarios

Supported audio formats

Unit price

paraformer-realtime-v2

Mandarin Chinese, Chinese dialects (Cantonese, Wu, Minnan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, Tianjin, Jiangxi, Yunnan, and Shanghainese), English, Japanese, Korean, German, French, and Russian

Supports real-time language switching.

Any

ApsaraVideo Live, online meetings, and other real-time applications.

pcm, wav, mp3, opus, speex, aac, and amr

$0.000035 per second

paraformer-realtime-8k-v2

8 kHz

Telephone customer service and other telephony applications.

Fun-ASR

Fun-ASR is a speech recognition model in the Tongyi Fun series that supports Chinese (Mandarin, Cantonese), English, Japanese, Thai, Vietnamese, and Indonesian.

Model

Version

Supported languages

Supported sample rates

Scenarios

Supported audio formats

Unit price

Free quota

fun-asr-mtl

Same as fun-asr-2025-08-25

Stable version

Chinese (Mandarin, Cantonese), English, Japanese, Thai, Vietnamese, and Indonesian

Any

ApsaraVideo Live, phone calls, and conference interpretation

aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, and wmv

$0.000032/second

None

fun-asr-mtl-2025-08-25

Snapshot version

Video generation: Wan and video editing

Text-to-Video

The Wan text-to-video model generates a video from a single sentence. The resulting video features rich artistic styles and cinematic quality. API reference | Try online

Model

Description

Price

wan2.5-t2v-preview Recommended

Wan 2.5 preview supports automatic dubbing and custom audio files.

480P: $0.043006/second

720P: $0.086012/second

1080P: $0.143353/second

wan2.2-t2v-plus Recommended

Wan 2.2 professional edition. Significantly improves image detail and motion stability.

480p: $0.02007/second

1080p: $0.100347/second

wanx2.1-t2v-turbo

Offers faster generation speed and balanced performance.

$0.034405/second

wanx2.1-t2v-plus

Offers richer details and higher-quality images.

$0.100347/second

Example input

Output video

Prompt: A kitten runs in the moonlight

Image-to-video: first frame

The Wan image-to-video model uses an input image as the first frame and a prompt to generate a video. The resulting video features rich artistic styles and cinematic quality. API reference | Try online

Model

Description

Price

wan2.5-i2v-preview Recommended

Wan 2.5 preview supports automatic dubbing and custom audio files.

480P: $0.043006/second

720P: $0.086012/second

1080P: $0.143353/second

wan2.2-i2v-plus Recommended

Wan 2.2 professional edition. It offers significant improvements in image detail and motion stability.

480p: $0.02007/second

1080p: $0.100347/second

wanx2.1-i2v-turbo

This model offers a faster generation speed and is more cost-effective. It takes only one-third of the time required by the plus model.

$0.034405/second

wanx2.1-i2v-plus

This model offers richer details and higher-quality images.

$0.100347/second

Input example

Output video

Input prompt: A cat runs on the grass

Input image:

image

Output video: The input image serves as the first frame of the video. The remaining frames are generated based on the prompt.

Model: wanx2.1-i2v-turbo.

Image-to-video: first and last frames

The Wan first-and-last-frame video generation model uses a prompt, a first frame image, and a last frame image to generate a smooth, dynamic video. API reference | Try online

Model

Price

wanx2.1-kf2v-plus

$0.100347 per second

Input example

Output video

First frame image

Last frame image

Prompt

first_frame

last_frame

Realistic style. A small black cat looks up at the sky. The camera starts at eye level and gradually rises to a top-down shot of the cat's curious eyes.

General video editing

The Wan unified video editing model supports multimodal inputs, such as text, images, and videos. It can perform video generation and general editing tasks. API reference

Model

Price

wanx2.1-vace-plus

$0.100347 per second

The unified video editing model supports the following features:

Model feature

Input reference image

Input prompt

Output video

Multi-image reference

Reference image 1 (reference entity)

image

Reference image 2 (reference background)

image

A girl walks out from the depths of an ancient, misty forest. She moves with light steps as the camera captures her graceful movements. When she stops to look at the lush woods, a smile of surprise and joy appears on her face. This moment, captured in the interplay of light and shadow, records the wonderful encounter between the girl and nature.

Output video

Video redrawing

A gentleman drives a black, steampunk-style car decorated with gears and copper pipes. The background features a steam-powered candy factory with retro elements, creating a vintage and fun atmosphere.

Local editing

Input video

Input mask image (The white area indicates the editing area)

mask

In a Parisian cafe, a lion in a suit elegantly sips coffee. It holds a coffee cup in one hand and looks content. The cafe is tastefully decorated with soft hues and warm lighting that illuminates the lion.

The content in the editing area is modified according to the prompt.

Video extension

Input initial video segment (1 second)

A dog wearing sunglasses is skateboarding on the street in a 3D cartoon style.

Output extended video (5 seconds)

Video frame expansion

An elegant lady passionately plays the violin with a full symphony orchestra behind her.

Wan - digital human

You can generate a video of a person speaking, singing, or performing with natural movements based on a single character image and an audio file. To use this feature, call the following models in sequence. wan2.2-s2v image detection | wan2.2-s2v video generation

Model

Description

Price

wan2.2-s2v-detect

Checks if an input image meets requirements, such as definition, a single person, and a frontal view.

$0.000574 per image

wan2.2-s2v

Generates a dynamic character video from a validated image and an audio clip.

480p: $0.071677 per second

720p: $0.129018 per second

Sample input

Output video

Input image:

input_image

Input audio:

AnimateAnyone

You can generate a character action video based on a character image and a character action template. To use this feature directly, call the following three models in sequence. AnimateAnyone image detection API details | AnimateAnyone action template generation | AnimateAnyone video generation API details

Model

Description

Price

animate-anyone-detect-gen2

Checks whether the input image meets the requirements.

$0.000574 per image

animate-anyone-template-gen2

Extracts character actions from a character motion video and generates an action template.

$0.011469 per second

animate-anyone-gen2

Generates a character action video based on a character image and an action template.

Input: Character image

Input: Action video

Output: Image background

Output: Video background

04-9_16

Note
  • The preceding examples were generated by the Tongyi App, which integrates AnimateAnyone.

  • The output of the AnimateAnyone model contains only video frames and does not include audio.

EMO

You can generate a dynamic portrait video based on a portrait image and a human voice audio file. To use this feature, call the following models in sequence. EMO image detection | EMO video generation

Model

Description

Price

emo-detect-v1

Checks whether the input image meets the requirements. This model can be called directly without deployment.

$0.000574 per image

emo-v1

Generates a dynamic portrait video. This model can be called directly without deployment.

  • 1:1 aspect ratio video: $0.011469 per second

  • 3:4 aspect ratio video: $0.022937 per second

Inputs: Portrait image and human voice audio file

Outputs: Dynamic portrait video

Portrait image:

上春山

Human voice audio: See the video on the right.

Dynamic portrait video:

Action style level: Active ("style_level": "active")

LivePortrait

You can generate a dynamic portrait video from a portrait image and a human voice audio file quickly and in a lightweight manner. Compared to the EMO model, this model offers faster generation and lower prices, but with lower output quality. To use this feature, call the following two models in sequence. LivePortrait image detection | LivePortrait video generation

Model

Description

Price

liveportrait-detect

Verifies that an input image meets the required specifications.

$0.000574 per image

liveportrait

Generates a dynamic portrait video.

$0.002868 per second

Inputs: A portrait image and a voice audio file

Outputs: A dynamic portrait video

Portrait:

Emoji男孩

Voice audio: See the video on the right.

Portrait video:

Emoji

You can generate a dynamic facial video from a face image and a preset dynamic face template. This feature can be used in scenarios such as creating emojis and generating video materials. To use this feature, call the following models in sequence. Emoji image detection | Emoji video generation

Model

Description

Price

emoji-detect-v1

Checks whether an input image meets specified requirements.

$0.000574 per image

emoji-v1

Generates a character expression from a portrait image that matches a specified emoji template.

$0.011469 per second

Input: Portrait image

Output: Dynamic portrait video

image.png

The template sequence for the "Happy" expression is ("input.driven_id": "mengwa_kaixin").

VideoRetalk

This feature uses a character video and a human voice audio file to generate a new video in which the character's lip movements match the input audio. To use this feature, call the following model. API reference

Model

Description

Price

videoretalk

Generates a new video where a character's lip movements are synchronized with the input audio.

$0.011469 per second

Video restyling

You can generate videos in different styles that match the semantic description of an input text. You can also use this feature to restyle an input video. API reference

Model

Description

Price

video-style-transform

Transforms an input video into various styles, such as Japanese anime and American comics.

720p

$0.071677 per second

540p

$0.028671 per second

Input video

Output video (Japanese anime)

Text embedding

A text embedding model converts text into a numerical representation used for tasks such as search, clustering, recommendation, and classification. Billing for the model is based on the number of input tokens. Synchronous API details.

Model

Embedding dimensions

Batch size

Maximum tokens per row

Supported languages

Price (Million input tokens)

text-embedding-v4

Part of the Qwen3-Embedding series

2,048, 1,536, 1,024 (default), 768, 512, 256, 128, or 64

10

8,192

Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, more than 100 other major languages, and various programming languages

$0.072

Multimodal embedding

Multimodal embedding models transform text, images, or videos into floating-point vectors to enable applications such as video classification, image classification, and image-text retrieval. API reference.

Model

Data type

Embedding dimension

Price

Rate limit

multimodal-embedding-v1

float(32)

1,024

Free trial

120 requests per minute (RPM)

Text classification, extraction, and ranking

Text Rerank

This feature is typically used for semantic retrieval, which sorts documents by their semantic relevance to a query. API reference.

Model

Maximum number of documents

Maximum input tokens per document

Maximum total input tokens

Supported languages

Price (Million input tokens)

gte-rerank-v2

500

4,000

30,000

Over 50 languages, including Chinese, English, Japanese, Korean, Thai, Spanish, French, Portuguese, German, Indonesian, and Arabic

$0.115

  • Maximum tokens per Query or Document: A single Query or Document is limited to 4,000 tokens. Input that exceeds this limit is truncated.

  • Maximum number of Documents: A single request is limited to 500 Documents.

  • Maximum input tokens: The total number of tokens for all Queries and Documents in a single request is limited to 30,000.

Industry

Intention recognition

The Tongyi intention recognition model quickly and accurately parses user intents and selects the appropriate tools to solve user problems, all within a few hundred milliseconds. API reference | Usage

Model

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

tongyi-intent-detect-v3

8,192

8,192

1,024

$0.058

$0.144

Role-playing

The Qwen role-playing model is ideal for creating lifelike conversational experiences in various scenarios, such as virtual social interactions, games with non-player characters (NPCs), and emulating intellectual property (IP) characters. It is also well-suited for integration into hardware, toys, and in-vehicle systems. Compared to other Qwen models, this model provides enhanced persona consistency, conversation progression, and empathetic listening. Usage

Model

Context window

Maximum input

Maximum output

Input price

Output price

(Tokens)

(Million tokens)

qwen-plus-character

32,768

32,000

4,096

$0.115

$0.287