removing needless `.collect()` in the middle of iterator chain reduces performance significantly #140873

cyrgani · 2025-05-09T21:43:44Z

reproduction (not minimal):
cyrgani/regulus@6c6dd75
command: cargo run --release -- tests/programs/sorting_tests.re

The relevant part is this:
https://github.com/cyrgani/regulus/blob/6c6dd758d3eadf0383b22d1acf2d7d9ccf21ac04/regulus/src/state.rs#L60-L82

With the currently present function (also below), the given test above takes about 400ms for me.

    pub fn get(&self, name: impl AsRef<str>) -> Option<&Atom> {
        let candidates = self
            .data
            .iter()
            .filter_map(|(ident, val)| {
                if ident.ident == name.as_ref() {
                    if true {
                        //let Source::Regular { .. } = ident.source {
                        Some((ident.source, val))
                    } else {
                        None
                    }
                } else {
                    None
                }
            })
            .collect::<Vec<_>>();

        candidates
            .into_iter()
            .max_by_key(|(source, _)| source.layer())
            .map(|(_, val)| val)
    }

It would seem logical now that removing the needless .collect() and .into_iter() should improve runtime (or at least not change it, if the compiler can optimize the allocation out). Instead, the simpler function causes the test to take around 600ms now (50% longer):

    pub fn get(&self, name: impl AsRef<str>) -> Option<&Atom> {
        self
            .data
            .iter()
            .filter_map(|(ident, val)| {
                if ident.ident == name.as_ref() {
                    if true {
                        //let Source::Regular { .. } = ident.source {
                        Some((ident.source, val))
                    } else {
                        None
                    }
                } else {
                    None
                }
            }
            .max_by_key(|(source, _)| source.layer())
            .map(|(_, val)| val)
    }

@rustbot label C-optimization I-slow E-needs-mcve

The text was updated successfully, but these errors were encountered:

moxian · 2025-05-10T18:12:21Z

Unable to reproduce on my machine. The collect-less version is a tiny bit faster for me, on average.

The code as provided does sometimes get ~25% slower with collect-less version, but that's most definitely noise. Putting the let result = runner.run(); part into a for _ in 0..10 loop in order to reduce the noise somewhat does show consistently (but marginally) better performance of collect-less version

output

original .collect().into_iter():

> cargo run  --release --features a -- tests/programs/sorting_tests.re  
   Compiling frontend v0.1.0 (H:\work\my\trash\rust-mini\_users\regulus\frontend)
    Finished `release` profile [optimized] target(s) in 1.70s
     Running `target\release\frontend.exe tests/programs/sorting_tests.re`
[frontend\src\main.rs:68:5] &FUNCTION_CLONE_COUNT = 1100
[frontend\src\main.rs:69:5] &OBJECT_CLONE_COUNT = 0
[frontend\src\main.rs:70:5] &STRING_CLONE_COUNT = 20
[frontend\src\main.rs:71:5] &LIST_CLONE_COUNT = 4250
[frontend\src\main.rs:72:5] t.elapsed() = 7.9402171s

streamlined:

> cargo run  --release --features b -- tests/programs/sorting_tests.re
   Compiling frontend v0.1.0 (H:\work\my\trash\rust-mini\_users\regulus\frontend)
    Finished `release` profile [optimized] target(s) in 1.82s
     Running `target\release\frontend.exe tests/programs/sorting_tests.re`
[frontend\src\main.rs:68:5] &FUNCTION_CLONE_COUNT = 1100
[frontend\src\main.rs:69:5] &OBJECT_CLONE_COUNT = 0
[frontend\src\main.rs:70:5] &STRING_CLONE_COUNT = 20
[frontend\src\main.rs:71:5] &LIST_CLONE_COUNT = 4250
[frontend\src\main.rs:72:5] t.elapsed() = 7.5821447s

x86_64-pc-windows-msvc; ancient Intel i5-6600 CPU; rustc 1.86.0 stable, similar results with nightly-2025-05-06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

removing needless `.collect()` in the middle of iterator chain reduces performance significantly #140873

removing needless `.collect()` in the middle of iterator chain reduces performance significantly #140873

cyrgani commented May 9, 2025

moxian commented May 10, 2025 •

edited

Loading

removing needless .collect() in the middle of iterator chain reduces performance significantly #140873

removing needless .collect() in the middle of iterator chain reduces performance significantly #140873

Comments

cyrgani commented May 9, 2025

moxian commented May 10, 2025 • edited Loading

removing needless `.collect()` in the middle of iterator chain reduces performance significantly #140873

removing needless `.collect()` in the middle of iterator chain reduces performance significantly #140873

moxian commented May 10, 2025 •

edited

Loading