Skip to content

Core: Add Node::iterate_children as a fast way to iterate a node's children #107369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Ivorforce
Copy link
Member

@Ivorforce Ivorforce commented Jun 10, 2025

This adds a performant way to iterate a node's children.
This was discussed on RocketChat.

Currently, you either iterate get_child_count using get_child, adding immense overhead, or you iterate get_children(), which allocates a whole new array. Both are very slow.

ChildrenIterator is technically not needed (Node ** could be iterated instead), but is added to hide the nature of the children storage from the caller. In the future, we may want to change storage without needing to change caller code.

Additionally, I'm adding an Iterable template, such that children are iterated on node.iter_children() instead of on node. If node was iterated, it is not immediately clear that the children are the iteratees. Also, checks would have to be run twice otherwise (for begin() and end() separately), making it unnecessarily slow.

@Ivorforce Ivorforce added this to the 4.x milestone Jun 10, 2025
@Ivorforce Ivorforce requested review from hpvb and lawnjelly June 10, 2025 16:49
@Ivorforce Ivorforce requested a review from a team as a code owner June 10, 2025 16:49
@Ivorforce Ivorforce force-pushed the node-iter-children branch 2 times, most recently from c38fdc9 to 09f6355 Compare June 10, 2025 17:34
@lawnjelly
Copy link
Member

lawnjelly commented Jun 11, 2025

Benchmark

StressTestingNodeChildren.zip

Debug

before : 32-35 fps
after : 23-24 fps

Release

before : 55-58 fps
after : 46-50 fps

We did suspect this might be slower in debug but it also seems slower in release than the previous version (accessing the child cache directly via Span). I'll try and pin down what is slowing it down.

@lawnjelly
Copy link
Member

lawnjelly commented Jun 11, 2025

Here's a profile. Notably I'm finding that a lot of small functions aren't getting inlined .. turns out it is likely because of thread guards, which I created an issue for #107393 .

iter_children

First obvious thing is it seems to be branching on p_internal. This could be due to the function not being inlined. It might be worth experimenting with putting this branch in the header, and if necessary calling two versions of the function (if desired in the cpp), as we should be able to get this to resolve at compile time.

@Ivorforce Ivorforce force-pushed the node-iter-children branch 2 times, most recently from 90c61dd to 3087b18 Compare June 11, 2025 08:32
@Ivorforce
Copy link
Member Author

Here's a profile. Notably I'm finding that a lot of small functions aren't getting inlined .. turns out it is likely because of thread guards, which I created an issue for #107393 .

iter_children

First obvious thing is it seems to be branching on p_internal. This could be due to the function not being inlined. It might be worth experimenting with putting this branch in the header, and if necessary calling two versions of the function (if desired in the cpp), as we should be able to get this to resolve at compile time.

Thank you for profiling!
Maybe i'm just not used to it, but the profile results are curious to me. It seems like most of the performance lost is scattered around the function in various spots. I would definitely have expected the thread guard to be by far the most expensive thing in this function, at least in debug. The only other reason I can imagine is cache misses for the accessed data, but that shouldn't much affect the fps since it's accessing the same data as before.

Anyway, I agree that p_include_internal doesn't need to be a runtime variable. I made it a template instead, and also eliminated the 0 branch which wasn't needed. I wouldn't expect this to make a big difference normally, but perhaps it does in all summation?

@lawnjelly
Copy link
Member

Maybe i'm just not used to it, but the profile results are curious to me.

It's definitely worth repeating on your end (maybe with different benchmarks, loops). I do a lot of profiling, but not in master, and the scons options are not super familiar to me, so it's possible I've messed something up in the build. It does seem to be inlining some functions (confirming release build) but it's not doing a very good job with this one.

@lawnjelly
Copy link
Member

It seems most of the slowdown is due to the thread guards. We have discussed removing them for this high performance iterator, and I'll check the performance with current master.

@Ivorforce Ivorforce force-pushed the node-iter-children branch from 3087b18 to a327a95 Compare June 12, 2025 16:47
@Ivorforce Ivorforce changed the title Core: Add Node::iter_children as a fast way to iterate a node's children Core: Add Node::iterate_children as a fast way to iterate a node's children Jun 12, 2025
Copy link
Member

@lawnjelly lawnjelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a lot of testing we pinned down the speed loss to the thread guards, and reasoned that they can be removed for this performance function.

Now this seems to perform as fast (or slightly faster) than before in release, and slightly slower in dev builds, but that should be fine.

@Ivorforce also noticed that I'd missed changing one of the slow Godot 4 get_child() functions to use the span in the original, so that may explain some of the speedup.

@lawnjelly
Copy link
Member

Sorry @Ivorforce this will need a slight rebase as I fixed the get_child() in the meantime.

…children, without needing allocations or `get_child`.

Adds `Iterable` class to templates.
@Ivorforce Ivorforce force-pushed the node-iter-children branch from a327a95 to 175c38d Compare June 13, 2025 15:02
@Repiteo Repiteo modified the milestones: 4.5, 4.6 Jun 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants