Skip to content

Use Mat3x4 for model and view transforms to save bandwidth and ALUs #107923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

clayjohn
Copy link
Member

@clayjohn clayjohn commented Jun 24, 2025

This improves performance in situations that are vertex shader bound (i.e. high vertex count). Early tests show that this makes an improvement on my intel integrated GPU quite broadly, but not on my M2 MBP.

I want to test a bit more widely to get a sense of the broad impact.

Checking with the Mali Offline Compiler, this change appears to shave off a few L/S operations and ALUs (about 10%). So I don't expect it to make a huge difference (especially on desktop). But its a free performance boost.

Built on top of #107876 to avoid conflicts

@Saul2022
Copy link

Cant make the artifact work for my s24 ultra , after the wheel spin is over, it gets stuck( downloaded the editor apk), 4.5 beta works fine tho.

@clayjohn
Copy link
Member Author

Cant make the artifact work for my s24 ultra , after the wheel spin is over, it gets stuck( downloaded the editor apk), 4.5 beta works fine tho.

I haven't done the mobile renderer implementation yet!

@Saul2022
Copy link

I haven't done the mobile renderer implementation yet!

ik but even selecting the mobile renderer gives the same result as with forward+

@clayjohn
Copy link
Member Author

I haven't done the mobile renderer implementation yet!

ik but even selecting the mobile renderer gives the same result as with forward+

Oh alright, I'll make sure to test on Android before marking this as ready for review.

@clayjohn clayjohn marked this pull request as ready for review July 1, 2025 19:42
@clayjohn clayjohn requested a review from a team as a code owner July 1, 2025 19:42
@clayjohn
Copy link
Member Author

clayjohn commented Jul 1, 2025

Tested now on MacOS, PopOS, and Android.

On my M2 macbook, I can't measure a difference.

On my Intel integrated GPU I get a consistent 5%ish improvement in the test scene from #68959. I get similar results on my Pixel 7 (Mali-G710) and pixel 4 (Adreno 640)

On other scenes more broadly I suspect the average performance benefit will be lower than 5% since the bottlenecks are often fragment processing and this does little to help with that. But at any rate, this is a free improvement to performance (in some cases) and battery life in all cases

@clayjohn
Copy link
Member Author

clayjohn commented Jul 1, 2025

Tagging this for 4.6 dev 1. It should be a pretty safe change, but it is a little risky for the Beta cycle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants