Skip to content

Faster math.next_power_of_two and refined API #5160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

0xrsp
Copy link
Contributor

@0xrsp 0xrsp commented May 14, 2025

Replace math.next_power_of_two with a faster implementation similar to that used in clang/gcc and refined API.

OLD API:
next_power_of_two :: proc "contextless" (x: int) -> int

NEW API:
next_power_of_two :: proc "contextless" (#any_int x: int) -> uint

@gingerBill
Copy link
Member

Have you actually tested this to see how much faster it is?

@0xrsp
Copy link
Contributor Author

0xrsp commented May 14, 2025

Benchmarked with LLVM backend x86_64 it is significantly faster on most optimization levels being at least ~2x faster and on average ~4-5x faster for most inputs, LLVM seems to produce. It also behaves better than the original implementation due to the return type being unsigned i.e.
orig(1<<62+1)=-1<<63
updated(1<<62+1)=1<<63

The reason I left this as a draft is the polymorphic int input could be wider than the uint so need a way to get an unsigned int that is the same width as a signed type.

Despite the initial branch seems to be much faster due to just less operations required and smaller code size.

Disassembly of new version:
image

Disassembly of old version:
image

@JesseRMeyer
Copy link

That's non-optimized codegen.

https://godbolt.org/z/KM3PfPGfe

@0xrsp 0xrsp marked this pull request as ready for review May 17, 2025 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants