Skip to content

Conversation

soares-f
Copy link
Contributor

This PR adds a notebook with an example of how to perform reliability scoring analysis in a win-tie-loss human evaluation.

Added notebook supporting GTC talk about the human touch.
This serves as basis to produce a REL score for win-tie-loss human evaluation with 2 models.
@soares-f
Copy link
Contributor Author

tag @fsoares on slack if needed

@dglogo dglogo self-requested a review March 17, 2025 23:44
@dglogo dglogo merged commit d8882c5 into NVIDIA:main Mar 17, 2025
anniesurla pushed a commit to anniesurla/GenerativeAIExamples that referenced this pull request Jun 5, 2025
* Example of reliability scoring in human eval

Added notebook supporting GTC talk about the human touch.
This serves as basis to produce a REL score for win-tie-loss human evaluation with 2 models.

* moved files around and created folder
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants