-
Notifications
You must be signed in to change notification settings - Fork 15
Basic LightEval integration #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR integrates LightEval into the ether0 benchmark by adding new SampleLevelMetric evaluations and task configurations while also updating dependency pins.
- Introduces new LightEval tasks and metrics in ether0/lighteval_tasks.py.
- Refactors problem type filtering functions in ether0/models.py.
- Adds integration tests for custom tasks in tests/test_lighteval_tasks.py and updates dependency configurations in the pyproject.toml files.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
tests/test_lighteval_tasks.py | Added integration tests for custom LightEval tasks. |
src/ether0/models.py | Refactored problem type filtering for clearer separation of concerns. |
src/ether0/lighteval_tasks.py | Introduced new tasks and metric evaluation functions for LightEval integration. |
pyproject.toml | Updated dependency list to include LightEval extras. |
packages/remotes/pyproject.toml | Adjusted tensorboard pin version to ensure compatibility. |
Comments suppressed due to low confidence (1)
packages/remotes/pyproject.toml:45
- Verify that downgrading the tensorboard pin to >=2.18 does not impact the features relied on in the remotes package; update or add tests if necessary.
"tensorboard>=2.18", # Indirect dependency we pin to keep recent
"ether0", | ||
"ether0.remotes[serve]", | ||
"tensorboard>=2.19", # Indirect dependency we pin to keep recent | ||
"tensorboard>=2.18", # Indirect dependency we pin to keep recent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I loosened the pinning here to allow for package resolution, as lighteval==0.10.0
requires numpy
v1: huggingface/lighteval#416
This PR:
SampleLevelMetric
s, fromaccuracy_reward
andformat_reward
LightevalTaskConfig
to support all different evaluation modes:To run a
gpt-4o
baseline:We are forced to hit a few warnings from LightEval (huggingface/lighteval#800, huggingface/lighteval#801), but eventually a Markdown results table appears: