-
Notifications
You must be signed in to change notification settings - Fork 467
feat(openai): instrument openai responses prompts #15159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…gration This update introduces the ability to capture prompt metadata (id, version, variables) for reusable prompts in the OpenAI integration. The changes include enhancements to the `openai_set_meta_tags_from_response` function to validate and store prompt data, as well as new tests to ensure the correct functionality. A new YAML cassette for testing responses with prompt tracking has also been added.
…uctions This update introduces a new function, `_extract_chat_template_from_instructions`, which extracts chat templates from OpenAI response instructions by replacing variable values with placeholders. Additionally, the `openai_set_meta_tags_from_response` function has been modified to utilize this new functionality, ensuring that chat templates are included in the prompt data. Tests have been added to verify the correct extraction and formatting of chat templates, including variable placeholders.
|
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 207 ± 1 ms. The average import time from base is: 209 ± 2 ms. The import time difference between this PR and base is: -1.87 ± 0.09 ms. Import time breakdownThe following import paths have shrunk:
|
Performance SLOsComparing candidate alex/MLOB-4411_instrument-openai-responses-prompts (d4ee140) with baseline main (aeb5df4) 📈 Performance Regressions (1 suite)📈 iast_aspects - 40/40✅ re_expand_aspectTime: ✅ 32.354µs (SLO: <40.000µs 📉 -19.1%) vs baseline: +1.6% Memory: ✅ 37.591MB (SLO: <39.000MB -3.6%) vs baseline: +5.5% ✅ re_expand_noaspectTime: ✅ 28.395µs (SLO: <40.000µs 📉 -29.0%) vs baseline: -1.2% Memory: ✅ 37.552MB (SLO: <39.000MB -3.7%) vs baseline: +5.4% ✅ re_findall_aspectTime: ✅ 2.908µs (SLO: <10.000µs 📉 -70.9%) vs baseline: ~same Memory: ✅ 37.493MB (SLO: <39.000MB -3.9%) vs baseline: +5.1% ✅ re_findall_noaspectTime: ✅ 1.436µs (SLO: <10.000µs 📉 -85.6%) vs baseline: +1.7% Memory: ✅ 37.532MB (SLO: <39.000MB -3.8%) vs baseline: +5.2% ✅ re_finditer_aspectTime: ✅ 4.384µs (SLO: <10.000µs 📉 -56.2%) vs baseline: ~same Memory: ✅ 37.473MB (SLO: <39.000MB -3.9%) vs baseline: +5.1% ✅ re_finditer_noaspectTime: ✅ 1.411µs (SLO: <10.000µs 📉 -85.9%) vs baseline: +0.7% Memory: ✅ 37.572MB (SLO: <39.000MB -3.7%) vs baseline: +5.2% ✅ re_fullmatch_aspectTime: ✅ 2.658µs (SLO: <10.000µs 📉 -73.4%) vs baseline: +2.1% Memory: ✅ 37.572MB (SLO: <39.000MB -3.7%) vs baseline: +5.3% ✅ re_fullmatch_noaspectTime: ✅ 1.303µs (SLO: <10.000µs 📉 -87.0%) vs baseline: +1.6% Memory: ✅ 37.591MB (SLO: <39.000MB -3.6%) vs baseline: +5.2% ✅ re_group_aspectTime: ✅ 2.941µs (SLO: <10.000µs 📉 -70.6%) vs baseline: +2.0% Memory: ✅ 37.454MB (SLO: <39.000MB -4.0%) vs baseline: +4.9% ✅ re_group_noaspectTime: ✅ 1.618µs (SLO: <10.000µs 📉 -83.8%) vs baseline: +0.9% Memory: ✅ 37.572MB (SLO: <39.000MB -3.7%) vs baseline: +5.2% ✅ re_groups_aspectTime: ✅ 3.099µs (SLO: <10.000µs 📉 -69.0%) vs baseline: +2.7% Memory: ✅ 37.532MB (SLO: <39.000MB -3.8%) vs baseline: +5.2% ✅ re_groups_noaspectTime: ✅ 1.725µs (SLO: <10.000µs 📉 -82.7%) vs baseline: +0.5% Memory: ✅ 37.532MB (SLO: <39.000MB -3.8%) vs baseline: +5.1% ✅ re_match_aspectTime: ✅ 2.947µs (SLO: <10.000µs 📉 -70.5%) vs baseline: 📈 +10.4% Memory: ✅ 37.552MB (SLO: <39.000MB -3.7%) vs baseline: +5.1% ✅ re_match_noaspectTime: ✅ 1.306µs (SLO: <10.000µs 📉 -86.9%) vs baseline: ~same Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +5.1% ✅ re_search_aspectTime: ✅ 2.507µs (SLO: <10.000µs 📉 -74.9%) vs baseline: -0.8% Memory: ✅ 37.552MB (SLO: <39.000MB -3.7%) vs baseline: +5.1% ✅ re_search_noaspectTime: ✅ 1.213µs (SLO: <10.000µs 📉 -87.9%) vs baseline: +1.7% Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +5.2% ✅ re_sub_aspectTime: ✅ 3.380µs (SLO: <10.000µs 📉 -66.2%) vs baseline: -0.9% Memory: ✅ 37.572MB (SLO: <39.000MB -3.7%) vs baseline: +5.2% ✅ re_sub_noaspectTime: ✅ 1.541µs (SLO: <10.000µs 📉 -84.6%) vs baseline: +1.0% Memory: ✅ 37.532MB (SLO: <39.000MB -3.8%) vs baseline: +5.3% ✅ re_subn_aspectTime: ✅ 3.601µs (SLO: <10.000µs 📉 -64.0%) vs baseline: -2.4% Memory: ✅ 37.572MB (SLO: <39.000MB -3.7%) vs baseline: +5.2% ✅ re_subn_noaspectTime: ✅ 1.616µs (SLO: <10.000µs 📉 -83.8%) vs baseline: ~same Memory: ✅ 37.572MB (SLO: <39.000MB -3.7%) vs baseline: +5.1% 🟡 Near SLO Breach (5 suites)🟡 djangosimple - 30/30✅ appsecTime: ✅ 20.408ms (SLO: <22.300ms -8.5%) vs baseline: -0.3% Memory: ✅ 66.121MB (SLO: <67.000MB 🟡 -1.3%) vs baseline: +5.0% ✅ exception-replay-enabledTime: ✅ 1.336ms (SLO: <1.450ms -7.8%) vs baseline: -0.2% Memory: ✅ 64.028MB (SLO: <67.000MB -4.4%) vs baseline: +4.7% ✅ iastTime: ✅ 20.442ms (SLO: <22.250ms -8.1%) vs baseline: -0.1% Memory: ✅ 66.227MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.1% ✅ profilerTime: ✅ 15.532ms (SLO: <16.550ms -6.2%) vs baseline: ~same Memory: ✅ 54.009MB (SLO: <54.500MB 🟡 -0.9%) vs baseline: +5.1% ✅ resource-renamingTime: ✅ 20.548ms (SLO: <21.750ms -5.5%) vs baseline: ~same Memory: ✅ 66.159MB (SLO: <67.000MB 🟡 -1.3%) vs baseline: +5.0% ✅ span-code-originTime: ✅ 25.370ms (SLO: <28.200ms 📉 -10.0%) vs baseline: -0.2% Memory: ✅ 67.257MB (SLO: <69.500MB -3.2%) vs baseline: +5.1% ✅ tracerTime: ✅ 20.449ms (SLO: <21.750ms -6.0%) vs baseline: ~same Memory: ✅ 66.170MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.2% ✅ tracer-and-profilerTime: ✅ 22.629ms (SLO: <23.500ms -3.7%) vs baseline: -0.2% Memory: ✅ 67.840MB (SLO: <68.000MB 🟡 -0.2%) vs baseline: +5.3% ✅ tracer-dont-create-db-spansTime: ✅ 19.346ms (SLO: <21.500ms 📉 -10.0%) vs baseline: +0.2% Memory: ✅ 66.099MB (SLO: <67.000MB 🟡 -1.3%) vs baseline: +4.9% ✅ tracer-minimalTime: ✅ 16.637ms (SLO: <17.500ms -4.9%) vs baseline: ~same Memory: ✅ 66.047MB (SLO: <67.000MB 🟡 -1.4%) vs baseline: +4.9% ✅ tracer-nativeTime: ✅ 20.477ms (SLO: <21.750ms -5.9%) vs baseline: +0.3% Memory: ✅ 67.751MB (SLO: <72.500MB -6.6%) vs baseline: +5.3% ✅ tracer-no-cachesTime: ✅ 18.449ms (SLO: <19.650ms -6.1%) vs baseline: -0.3% Memory: ✅ 66.199MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.3% ✅ tracer-no-databasesTime: ✅ 18.809ms (SLO: <20.100ms -6.4%) vs baseline: +0.3% Memory: ✅ 65.942MB (SLO: <67.000MB 🟡 -1.6%) vs baseline: +5.1% ✅ tracer-no-middlewareTime: ✅ 20.188ms (SLO: <21.500ms -6.1%) vs baseline: +0.2% Memory: ✅ 66.160MB (SLO: <67.000MB 🟡 -1.3%) vs baseline: +4.9% ✅ tracer-no-templatesTime: ✅ 20.266ms (SLO: <22.000ms -7.9%) vs baseline: -0.3% Memory: ✅ 66.110MB (SLO: <67.000MB 🟡 -1.3%) vs baseline: +5.1% 🟡 errortrackingdjangosimple - 6/6✅ errortracking-enabled-allTime: ✅ 17.999ms (SLO: <19.850ms -9.3%) vs baseline: -0.2% Memory: ✅ 66.218MB (SLO: <66.500MB 🟡 -0.4%) vs baseline: +5.0% ✅ errortracking-enabled-userTime: ✅ 18.117ms (SLO: <19.400ms -6.6%) vs baseline: +0.6% Memory: ✅ 66.237MB (SLO: <66.500MB 🟡 -0.4%) vs baseline: +5.2% ✅ tracer-enabledTime: ✅ 18.156ms (SLO: <19.450ms -6.7%) vs baseline: +0.6% Memory: ✅ 65.864MB (SLO: <66.500MB 🟡 -1.0%) vs baseline: +5.2% 🟡 errortrackingflasksqli - 6/6✅ errortracking-enabled-allTime: ✅ 2.067ms (SLO: <2.300ms 📉 -10.1%) vs baseline: -0.1% Memory: ✅ 52.632MB (SLO: <53.500MB 🟡 -1.6%) vs baseline: +4.9% ✅ errortracking-enabled-userTime: ✅ 2.065ms (SLO: <2.250ms -8.2%) vs baseline: -0.4% Memory: ✅ 52.514MB (SLO: <53.500MB 🟡 -1.8%) vs baseline: +4.9% ✅ tracer-enabledTime: ✅ 2.064ms (SLO: <2.300ms 📉 -10.3%) vs baseline: -0.2% Memory: ✅ 52.632MB (SLO: <53.500MB 🟡 -1.6%) vs baseline: +5.2% 🟡 flasksimple - 18/18✅ appsec-getTime: ✅ 4.600ms (SLO: <4.750ms -3.2%) vs baseline: -0.3% Memory: ✅ 62.360MB (SLO: <65.000MB -4.1%) vs baseline: +5.1% ✅ appsec-postTime: ✅ 6.640ms (SLO: <6.750ms 🟡 -1.6%) vs baseline: +0.1% Memory: ✅ 62.354MB (SLO: <65.000MB -4.1%) vs baseline: +5.0% ✅ appsec-telemetryTime: ✅ 4.591ms (SLO: <4.750ms -3.4%) vs baseline: -0.6% Memory: ✅ 62.336MB (SLO: <65.000MB -4.1%) vs baseline: +5.0% ✅ debuggerTime: ✅ 1.858ms (SLO: <2.000ms -7.1%) vs baseline: ~same Memory: ✅ 45.254MB (SLO: <47.000MB -3.7%) vs baseline: +4.7% ✅ iast-getTime: ✅ 1.855ms (SLO: <2.000ms -7.2%) vs baseline: -0.2% Memory: ✅ 42.147MB (SLO: <49.000MB 📉 -14.0%) vs baseline: +5.1% ✅ profilerTime: ✅ 1.911ms (SLO: <2.100ms -9.0%) vs baseline: ~same Memory: ✅ 46.700MB (SLO: <47.000MB 🟡 -0.6%) vs baseline: +5.4% ✅ resource-renamingTime: ✅ 3.368ms (SLO: <3.650ms -7.7%) vs baseline: +0.2% Memory: ✅ 52.623MB (SLO: <53.500MB 🟡 -1.6%) vs baseline: +5.2% ✅ tracerTime: ✅ 3.356ms (SLO: <3.650ms -8.1%) vs baseline: ~same Memory: ✅ 52.599MB (SLO: <53.500MB 🟡 -1.7%) vs baseline: +5.1% ✅ tracer-nativeTime: ✅ 3.350ms (SLO: <3.650ms -8.2%) vs baseline: ~same Memory: ✅ 54.137MB (SLO: <60.000MB -9.8%) vs baseline: +5.0% 🟡 flasksqli - 6/6✅ appsec-enabledTime: ✅ 3.965ms (SLO: <4.200ms -5.6%) vs baseline: +0.3% Memory: ✅ 62.403MB (SLO: <66.000MB -5.4%) vs baseline: +5.1% ✅ iast-enabledTime: ✅ 2.438ms (SLO: <2.800ms 📉 -12.9%) vs baseline: +0.3% Memory: ✅ 59.179MB (SLO: <60.000MB 🟡 -1.4%) vs baseline: +5.0% ✅ tracer-enabledTime: ✅ 2.067ms (SLO: <2.250ms -8.1%) vs baseline: +0.6% Memory: ✅ 52.671MB (SLO: <54.500MB -3.4%) vs baseline: +5.2%
|
Description
Adds prompt tracking for OpenAI reusable prompts.
The problem: OpenAI returns rendered prompts (with variables filled in), but prompt tracking needs templates with placeholders like
{{variable_name}}.The solution: Reverse templating - reconstruct the template by replacing variable values with placeholders.
How it works:
Why single-pass regex + longest values first?
Simple
.replace()in a loop breaks with overlapping values:This handles most overlaps, but making it perfect is probably impossible heuristically. We're aiming for it to work simply in typical real-world scenarios.
Testing
Added
test_response_with_prompt_tracking()verifying prompt metadata, chat_template extraction, and placeholder replacement.Risks
Making this perfect is likely impossible since we're reverse-engineering the template from rendered output. The approach works well for typical real-world usage where:
Additional Notes
OpenAI doesn't expose templates via API, so we reconstruct them. If they add template retrieval later or backend supports template-less prompts, we can remove this logic.