This topic describes the server-side events for the Qwen-Omni Real-time API.
For more information, see Real-time multimodal.
Server-side events
error
The server returns an error message for both client-side and server-side errors. Most errors are recoverable and do not affect the session.
|
|
session.created
This is the first event that the server sends after a client connects. The event contains the default configurations for the connection.
|
|
session.updated
When a session.update request is processed successfully, the server returns a session.updated event. Otherwise, a real-time multimodal event is returned.
|
|
input_audio_buffer.speech_started
In server_vad mode, the server returns the input_audio_buffer.speech_started
event when it detects the start of speech in the audio buffer.
This event can occur whenever audio is added to the buffer, unless speech has already been detected.
|
|
input_audio_buffer.speech_stopped
In server_vad mode, the server returns the input_audio_buffer.speech_stopped
event when it detects the end of speech in the audio buffer.
The server also sends a conversation.item.created
event, which contains the user message item created from the audio buffer.
|
|
input_audio_buffer.committed
In server_vad mode, the server automatically submits the buffer and returns this event when it detects that the user has stopped speaking. In non-server_vad mode, the server returns this event in response to the client's input_audio_buffer.commit event after the client finishes sending audio.
|
|
input_audio_buffer.cleared
After the client sends an input_audio_buffer.clear
event, the server returns an input_audio_buffer.cleared
event.
|
|
conversation.item.created
This event is returned when a conversation item is created.
|
|
conversation.item.input_audio_transcription.completed
This event contains the audio transcription that is generated after the user's audio has been written to the audio buffer. Although the Realtime model accepts audio input, the transcription is handled by a separate process. This process runs on a dedicated automatic speech recognition (ASR) model, which is currently gummy-realtime-v1
. The transcribed text may differ from the model's interpretation and is for reference only.
|
|
conversation.item.input_audio_transcription.failed
If input audio transcription is enabled and fails, the server returns the conversation.item.input_audio_transcription.failed
event. This event is separate from other error
events so that the client can identify the related item.
|
|
response.created
The server sends this event when it starts generating a new model response.
|
|
response.done
The server returns this event when response generation is complete. The Response object in this event contains all output items from the response but excludes the raw audio data that was already sent in previous events.
|
|
response.text.delta
The server returns the response.text.delta
event as the model incrementally generates new text.
|
|
response.text.done
The server returns the response.text.done
event when the model finishes generating text.
This event is also returned when a response is interrupted, incomplete, or canceled.
|
|
response.audio.delta
The server returns the response.audio.delta event as the model incrementally generates new audio data.
|
|
response.audio.done
The server returns the response.audio.done event when the model finishes generating audio data.
This event is also returned when a response is interrupted, incomplete, or canceled.
|
|
response.audio_transcript.delta
The server returns the response.audio_transcript.delta event as the model incrementally generates text corresponding to the new audio.
|
|
response.audio_transcript.done
The server returns the response.audio_transcript.done event when the model finishes generating the text that corresponds to the new audio.
|
|
response.output_item.added
The server sends this event when a new output item is added.
|
|
response.output_item.done
The server returns this event when the output for the new item is complete.
|
|
response.content_part.added
The server sends this event when a new content part is added.
|
|
response.content_part.done
The server returns this event when the output for the new content part is complete.
|
|