Skip to content

Record detailed test failure modes #163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jdries opened this issue May 1, 2025 · 1 comment
Open

Record detailed test failure modes #163

jdries opened this issue May 1, 2025 · 1 comment
Assignees

Comments

@jdries
Copy link
Contributor

jdries commented May 1, 2025

Currently, test scenarios either fail or succeed.
We notice that there are however different failure modes:

  • The service was not able to produce any result.
  • The service produced a result, it was X% different for the same input data
  • The service produced a result, it was X% different, a change in input data was detected

Tracking these separately upon failure would give us a more nuanced picture, allowing to spot faster where fast intervention is relevant.

This plugin seems to do this:
https://pypi.org/project/pytest-custom-outputs/

@soxofaan
Copy link
Contributor

Already added new tracking of separate phases of a benchmark:
connect, create-job, run-job, collect-metadata, download-actual, download-reference and compare.

If an exception happens, it's now possible to see from the "metrics" in which phase this happened:
e.g. run 1007 has metric

['test:phase:exception', 'run-job']

which means an exception occurred during the run-job phase

There is also an initial feature for "change in input data was detected": if there is a change in derived_from links, the metric will be

["test:phase:exception", "compare:derived_from-change"],

e.g see test_track_phase_describe_derived_from_change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants