Skip to content

AOAI results nits: fix order and _result column #40897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 6, 2025

Conversation

MilesHolland
Copy link
Member

Apparently OAI likes to return their results in backwards (or possibly random) order, so I've changed the OAI results parsing to sort the results before merging them with the other evaluation results.

Also there was a bug in the pass/fail _result column code that would always result in the same value for that entire column. Changed to properly be a per-row result.

@Copilot Copilot AI review requested due to automatic review settings May 5, 2025 20:04
@MilesHolland MilesHolland requested a review from a team as a code owner May 5, 2025 20:04
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes two issues with the AOAI evaluation results: ensuring that the results are merged in the correct order, and correcting a bug where the pass/fail _result column was producing a single value for all rows.

  • The results dictionary is now initialized with an explicit "index" key to track row order.
  • The _result column for each grader is now built by appending values per row from the raw results.
Comments suppressed due to low confidence (2)

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate_aoai.py:243

  • [nitpick] Using the key 'index' might be confusing since it is commonly associated with a DataFrame's index. Consider renaming it to 'datasource_item_id' for better clarity.
listed_results = {"index": []}

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate_aoai.py:257

  • [nitpick] The use of a magic number (50) to limit the length of result_column_name could be unclear to future maintainers. Consider defining and using a named constant for this limit.
if len(result_column_name) < 50:

@github-actions github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label May 5, 2025
@MilesHolland MilesHolland merged commit b661c88 into Azure:main May 6, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Evaluation Issues related to the client library for Azure AI Evaluation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants