Skip to content

Minh/s2s context summary #1813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Minh/s2s context summary #1813

wants to merge 11 commits into from

Conversation

minh-hoque
Copy link

@minh-hoque minh-hoque commented May 5, 2025

Summary

This PR introduces a fully-worked Jupyter notebook, examples/Context_summarization_with_realtime_api.ipynb, that demonstrates how to:

  1. Stream live microphone audio into the OpenAI Realtime (voice-to-voice) API via WebSockets.
  2. Play assistant speech back in near-real-time.
  3. Track running token usage for both text and audio tokens.
  4. Auto-summarize and prune older turns once the context window nears a configurable threshold (demo ≈ 2 k)

Motivation

Long-running voice sessions quickly accumulate LARGE amounts of audio tokens, causing latency and quality drift. This demo provides a reference implementation of rolling summarization that compresses earlier dialogue into a single assistant message (in French for the demo), deletes superseded items server-side, and keeps only the last N turns verbatim—saving cost and preserving quality.


For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

  • I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
  • I have conducted a self-review of my content based on the contribution guidelines:
    • Relevance: This content is related to building with OpenAI technologies and is useful to others.
    • Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
    • Spelling and Grammar: I have checked for spelling or grammatical mistakes.
    • Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
    • Correctness: The information I include is correct and all of my code executes successfully.
    • Completeness: I have explained everything fully, including all necessary references and citations.

We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant