Skip to content

Missed optimization for unaligned store via shifts #139441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
0f-0b opened this issue May 11, 2025 · 1 comment
Open

Missed optimization for unaligned store via shifts #139441

0f-0b opened this issue May 11, 2025 · 1 comment

Comments

@0f-0b
Copy link

0f-0b commented May 11, 2025

This Rust code, when compiled for aarch64-unknown-linux-gnu, generates only one instruction.

#[inline(never)]
pub fn write(out: &mut [u32; 2], a: u64) {
  out[0] = a as u32;
  out[1] = (a >> 32) as u32;
}
example::write::h4c19b1f2c54c5627:
        str     x1, [x0]
        ret

However, inefficient code is emitted when there are 2 or more u64s to store.

#[inline(never)]
pub fn write2(out: &mut [u32; 4], a: u64, b: u64) {
  out[0] = a as u32;
  out[1] = (a >> 32) as u32;
  out[2] = b as u32;
  out[3] = (b >> 32) as u32;
}
example::write2::h650f933056ff8897:
        lsr     x8, x1, #32
        lsr     x9, x2, #32
        stp     w1, w8, [x0]
        stp     w2, w9, [x0, #8]
        ret

Compiler Explorer.

@llvmbot
Copy link
Member

llvmbot commented May 11, 2025

@llvm/issue-subscribers-backend-aarch64

Author: ud2 (0f-0b)

This Rust code, when compiled for `aarch64-unknown-linux-gnu`, generates only one instruction.
#[inline(never)]
pub fn write(out: &mut [u32; 2], a: u64) {
  out[0] = a as u32;
  out[1] = (a >> 32) as u32;
}
example::write::h4c19b1f2c54c5627:
        str     x1, [x0]
        ret

However, inefficient code is emitted when there are 2 or more u64s to store.

#[inline(never)]
pub fn write2(out: &mut [u32; 4], a: u64, b: u64) {
  out[0] = a as u32;
  out[1] = (a >> 32) as u32;
  out[2] = b as u32;
  out[3] = (b >> 32) as u32;
}
example::write2::h650f933056ff8897:
        lsr     x8, x1, #<!-- -->32
        lsr     x9, x2, #<!-- -->32
        stp     w1, w8, [x0]
        stp     w2, w9, [x0, #<!-- -->8]
        ret

Compiler Explorer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants