Skip to content

removing needless .collect() in the middle of iterator chain reduces performance significantly #140873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cyrgani opened this issue May 9, 2025 · 1 comment
Labels
C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-slow Issue: Problems and improvements with respect to performance of generated code. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged.

Comments

@cyrgani
Copy link
Contributor

cyrgani commented May 9, 2025

reproduction (not minimal):
cyrgani/regulus@6c6dd75
command: cargo run --release -- tests/programs/sorting_tests.re

The relevant part is this:
https://github.com/cyrgani/regulus/blob/6c6dd758d3eadf0383b22d1acf2d7d9ccf21ac04/regulus/src/state.rs#L60-L82

With the currently present function (also below), the given test above takes about 400ms for me.

    pub fn get(&self, name: impl AsRef<str>) -> Option<&Atom> {
        let candidates = self
            .data
            .iter()
            .filter_map(|(ident, val)| {
                if ident.ident == name.as_ref() {
                    if true {
                        //let Source::Regular { .. } = ident.source {
                        Some((ident.source, val))
                    } else {
                        None
                    }
                } else {
                    None
                }
            })
            .collect::<Vec<_>>();

        candidates
            .into_iter()
            .max_by_key(|(source, _)| source.layer())
            .map(|(_, val)| val)
    }

It would seem logical now that removing the needless .collect() and .into_iter() should improve runtime (or at least not change it, if the compiler can optimize the allocation out). Instead, the simpler function causes the test to take around 600ms now (50% longer):

    pub fn get(&self, name: impl AsRef<str>) -> Option<&Atom> {
        self
            .data
            .iter()
            .filter_map(|(ident, val)| {
                if ident.ident == name.as_ref() {
                    if true {
                        //let Source::Regular { .. } = ident.source {
                        Some((ident.source, val))
                    } else {
                        None
                    }
                } else {
                    None
                }
            }
            .max_by_key(|(source, _)| source.layer())
            .map(|(_, val)| val)
    }

@rustbot label C-optimization I-slow E-needs-mcve

@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-slow Issue: Problems and improvements with respect to performance of generated code. labels May 9, 2025
@moxian
Copy link
Contributor

moxian commented May 10, 2025

Unable to reproduce on my machine. The collect-less version is a tiny bit faster for me, on average.

The code as provided does sometimes get ~25% slower with collect-less version, but that's most definitely noise. Putting the let result = runner.run(); part into a for _ in 0..10 loop in order to reduce the noise somewhat does show consistently (but marginally) better performance of collect-less version

output

original .collect().into_iter():

> cargo run  --release --features a -- tests/programs/sorting_tests.re  
   Compiling frontend v0.1.0 (H:\work\my\trash\rust-mini\_users\regulus\frontend)
    Finished `release` profile [optimized] target(s) in 1.70s
     Running `target\release\frontend.exe tests/programs/sorting_tests.re`
[frontend\src\main.rs:68:5] &FUNCTION_CLONE_COUNT = 1100
[frontend\src\main.rs:69:5] &OBJECT_CLONE_COUNT = 0
[frontend\src\main.rs:70:5] &STRING_CLONE_COUNT = 20
[frontend\src\main.rs:71:5] &LIST_CLONE_COUNT = 4250
[frontend\src\main.rs:72:5] t.elapsed() = 7.9402171s

streamlined:

> cargo run  --release --features b -- tests/programs/sorting_tests.re
   Compiling frontend v0.1.0 (H:\work\my\trash\rust-mini\_users\regulus\frontend)
    Finished `release` profile [optimized] target(s) in 1.82s
     Running `target\release\frontend.exe tests/programs/sorting_tests.re`
[frontend\src\main.rs:68:5] &FUNCTION_CLONE_COUNT = 1100
[frontend\src\main.rs:69:5] &OBJECT_CLONE_COUNT = 0
[frontend\src\main.rs:70:5] &STRING_CLONE_COUNT = 20
[frontend\src\main.rs:71:5] &LIST_CLONE_COUNT = 4250
[frontend\src\main.rs:72:5] t.elapsed() = 7.5821447s

x86_64-pc-windows-msvc; ancient Intel i5-6600 CPU; rustc 1.86.0 stable, similar results with nightly-2025-05-06

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-slow Issue: Problems and improvements with respect to performance of generated code. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged.
Projects
None yet
Development

No branches or pull requests

3 participants