TaskAdherence V2 prompt updates #41616

ghyadav · 2025-06-17T11:14:37Z

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new swagger spec, a link to the pull request containing these swagger spec changes has been included above.

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

Copilot

Pull Request Overview

This PR modernizes the TaskAdherence evaluator by overhauling the prompt to expect a structured JSON output and adding helper routines to normalize conversation inputs for more reliable LLM calls.

Switched the prompt’s response format to json_object and restructured the system prompt with clear JSON schema and examples.
Updated the evaluator implementation (_task_adherence.py) to apply new formatting helpers and handle dictionary outputs from the LLM, including exposing full additional_details.
Extended utils.py with three new functions—reformat_conversation_history, reformat_agent_response, and reformat_tool_definitions—to prepare inputs for the prompt.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
sdk/.../task_adherence.prompty	Changed output type to JSON object and rewrote system prompt to define keys, steps, and scoring examples
sdk/.../_task_adherence/_task_adherence.py	Removed old parsing helper, imported new reformatters, mutated `eval_input`, unified output handling
sdk/.../_common/utils.py	Added conversation/response/tool-definition reformatters with fallback logic; imported `ErrorMessage`

Comments suppressed due to low confidence (1)

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_task_adherence/_task_adherence.py:145

[nitpick] Mutating eval_input in place may obscure the original data flow. Consider assigning the reformatted value to a new local variable to preserve the raw input for debugging.

eval_input['query'] = reformat_conversation_history(eval_input["query"])

Copilot · 2025-06-17T11:16:48Z

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_common/utils.py

+    try:
+        conversation_history = _get_conversation_history(query)
+        return _pretty_format_conversation_history(conversation_history)
+    except:


Avoid using a bare except which can hide unexpected errors; catch specific exceptions (e.g., ValueError, KeyError) and log the exception to aid debugging.

Copilot · 2025-06-17T11:16:49Z

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_common/utils.py

+        #   Higher inter model variance (0.345 vs 0.607)
+        #   Lower percentage of mode in Likert scale (73.4% vs 75.4%)
+        #   Lower pairwise agreement between LLMs (85% vs 90% at the pass/fail level with threshold of 3)
+        return query


The fallback returns the original message list rather than a string, which may break downstream prompts that expect a formatted string. Consider serializing or stringifying the original query for consistency.

…into ghyadav/task_adherence_v2 # Conflicts: # sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_common/utils.py # sdk/evaluation/azure-ai-evaluation/tests/unittests/test_utils.py

ShipraJain01 · 2025-06-23T15:16:38Z

...n/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_task_adherence/task_adherence.prompty

+CONVERSATION_HISTORY: |
+  SYSTEM MESSAGE: Always use tools for factual queries. Never provide personal opinions.
+  User: What's the weather in Tokyo?
+AGENT_RESPONSE: |


How are we defining Multi-Turn vs Single turn Few shots? in case of Multi-Turn, do we call this as Agent Response? of

TaskAdherence V2 prompt updates

1871240

Copilot AI review requested due to automatic review settings June 17, 2025 11:14

ghyadav requested a review from a team as a code owner June 17, 2025 11:14

github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Jun 17, 2025

Copilot AI reviewed Jun 17, 2025

View reviewed changes

ghyadav added 9 commits June 19, 2025 13:50

Add system messages to task adherence examples

a347a40

Add system messages to task adherence examples

edd6e60

Merge branch 'main' of https://github.com/ghyadav/azure-sdk-for-python …

f692e37

…into ghyadav/task_adherence_v2 # Conflicts: # sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_common/utils.py # sdk/evaluation/azure-ai-evaluation/tests/unittests/test_utils.py

Add system messages to task adherence examples

0669e9f

Add system messages to task adherence examples

1da09c8

Add system messages to task adherence examples

7d685ab

Add system messages to task adherence examples

1a5115d

Prompt hanges with lesses example. Inter variance similar

d745555

Prompt hanges with lesses example. Inter variance similar

6ed8325

ShipraJain01 reviewed Jun 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TaskAdherence V2 prompt updates #41616

TaskAdherence V2 prompt updates #41616

ghyadav commented Jun 17, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jun 17, 2025

Uh oh!

Copilot AI Jun 17, 2025

Uh oh!

ShipraJain01 Jun 23, 2025

Uh oh!

Uh oh!

TaskAdherence V2 prompt updates #41616

Are you sure you want to change the base?

TaskAdherence V2 prompt updates #41616

Conversation

ghyadav commented Jun 17, 2025

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

ShipraJain01 Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!