Skip to content

Add code to Bundleio to generate error stats #12051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

zingo
Copy link
Collaborator

@zingo zingo commented Jun 27, 2025

Add a way to get error stats/metrics between actual and reference output.

cc @digantdesai @freddan80 @per @oscarandersson8218

Signed-off-by: Zingo Andersen <[email protected]>
Change-Id: Ib51b22c80954c87812b81b6fa9798ace705a555a
Copy link

pytorch-bot bot commented Jun 27, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12051

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 49542cb with merge base 142b1c6 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 27, 2025
@zingo zingo added release notes: devtools Changes to dev tooling, for example the debugger & profiler release notes: arm Changes to the ARM backend delegate ciflow/trunk partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm labels Jun 27, 2025
@zingo
Copy link
Collaborator Author

zingo commented Jun 27, 2025

Hi @digantdesai and @mergennachin this PR will add a new method/API to BundleIO and maybe you want to involve the proper people about this :) Also is the metric sane? And the way to propagate it back to the runner?

My basics intentions is to be able to log/track models over time in a better way then PASS/FAIL on a set atol/rtol, as it easily miss if we could have set rtol/atol lower when improving stuff. Something like this is also useful to get a good guess of what atol/rtol could be to make it work instead of a lot of trail and error with different values.

Comment on lines +424 to +433
double abs_err = std::abs(a_data[k] - e_data[k]);
double relative_divider =
std::max(std::abs(a_data[k]), std::abs(e_data[k]));
relative_divider = std::max(relative_divider, eps);
double relative_err = abs_err / relative_divider;

sum_abs += abs_err;
max_abs = std::max(max_abs, abs_err);
sum_rel += relative_err;
max_rel = std::max(max_rel, relative_err);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this good? I'm no ML-math-stats person so if this can be improved we should in PR or after :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is good for regular cases e.g. both input and target are in float32 and good enough for bundled program. We shouldn't expect that bundled program can cover all cases like quantization

@digantdesai
Copy link
Contributor

I like this general direction of getting more than true/false. And since its opt in, we care a bit less about the binary size overhead. I will let @Gasoonjia weigh in. He is looking at this for AoT with multiple "distance" measures. Thanks @zingo.

Copy link
Contributor

@Gasoonjia Gasoonjia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thansk @zingo for the update and overall l love the update. It makes the error msg more meaningful.

Also spoiler alert we will have a new api in 0.7 release (https://github.com/pytorch/executorch/blob/main/devtools/inspector/_inspector.py#L1365) for comparing intermediate output in operator-level, beyond what we have right now in bundled program for only compare the final result! Stay tune and we will have a doc for better demonstration!

@Gasoonjia Gasoonjia merged commit 59e0476 into pytorch:main Jul 2, 2025
201 checks passed
Tanish2101 pushed a commit to Tanish2101/executorch that referenced this pull request Jul 9, 2025
Add a way to get error stats/metrics between actual and reference
output.


cc @digantdesai @freddan80 @per @oscarandersson8218

Signed-off-by: Zingo Andersen <[email protected]>
@digantdesai
Copy link
Contributor

Thank you both. @zingo or @Gasoonjia we should use this compute_method_output_error_stats for other runners as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm release notes: arm Changes to the ARM backend delegate release notes: devtools Changes to dev tooling, for example the debugger & profiler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants