Skip to content

SceneTreeFTI faster access to Node children #106224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 22, 2025

Conversation

lawnjelly
Copy link
Member

@lawnjelly lawnjelly commented May 10, 2025

Adds methods to directly access cached children from outside Node:

update_cached_children()
get_cached_children()

Helps address #106185.
Supersedes #106214.

MRP

ManyStaticColliders2.zip

Before this PR: 160fps
After this PR: 250fps
Physics interpolation off: 360fps

Discussion

#104269 introduced the new 3D FTI and uses a reference approach traversing the entire scene tree which is expected to not be super fast with large numbers of nodes in the tree. This is to be considerably optimized in #105728 (which only updates branches that have changed) and #105901.

However the reason for a lot of the slowdown in #106185 was unexpected - it was due to changes made for 4.x in #75627 which seem to make it (very?) inefficient to access children via get_child_count() and get_child(). This problem does not exist in 3.x.

This is worth optimizing because although after #105728 this MRP will no longer be an issue for static nodes, the bottleneck will still exist when moving large numbers of nodes (e.g. via a parent).

Profiling makes clear what is happening (this is debug, but should represent release to an extent):

profile_traversal_master

Investigating suggested that get_child_count() and get_child() could be very inefficient, especially in debug with thread checks.

Notes

  • Access via these new methods seems to benchmark at approx 16x faster than via get_child() and get_child_count().
  • get_child() is slow because it includes a thread guard, updates child cache every call, branches on internal and index, and includes bounds checking.
  • get_child_count() is especially slow in for loops because it includes thread guard, update to child cache, branches on internal, and math.
  • Alternatives including making SceneTreeFTI a friend and accessing the cached children directly.
  • I tested allowing access to the LocalVector directly versus Span, and Span is optimized out and runs as fast as the former, so is probably a more generic solution than returning the LocalVector.
  • If we roll out this or similar approach, we should look through existing bottleneck child iteration code to see whether it can be improved, perhaps using the same paradigm / API. In particular as a first step, storing child_count in a local variable is an easy win in for loops.

I don't have strong opinions as to which method would be best for access:

  • friend discourages use in other areas of the engine (especially while evaluating whether this will work in the wild)
  • friend is usually better to use sparingly, because it makes inter dependencies in the engine less clear
  • public methods allow use from other parts of the engine
  • It would alternatively be possible to have wrapper functions for size and [] instead of returning a Span

@lawnjelly lawnjelly added this to the 4.5 milestone May 10, 2025
@lawnjelly lawnjelly requested a review from a team as a code owner May 10, 2025 00:21
@lawnjelly lawnjelly force-pushed the fti_scenetree_faster_children branch from c7e3878 to 8b42e49 Compare May 10, 2025 00:24
@lawnjelly lawnjelly marked this pull request as draft May 10, 2025 00:29
@lawnjelly lawnjelly force-pushed the fti_scenetree_faster_children branch from 8b42e49 to 312090d Compare May 10, 2025 00:36
@lawnjelly lawnjelly marked this pull request as ready for review May 10, 2025 00:42
@lawnjelly lawnjelly force-pushed the fti_scenetree_faster_children branch from 312090d to d1a3043 Compare May 10, 2025 07:53
Copy link
Member

@Ivorforce Ivorforce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. We desperately need a high performance access to node children.
Using Span for this makes sense to me since it allows to not expose which container is used internally, and it has the lowest chance of accidental overhead.

Code looks good!

@Ivorforce Ivorforce requested a review from clayjohn May 10, 2025 09:58
@lawnjelly
Copy link
Member Author

Incidentally, it would also be possible I think to use a custom Span to return the subset that doesn't include internal children, but I've left that for followup if we decide to use this from more places.

e.g.
Span<Node *> get_cached_children(bool p_include_internal) ...

@lawnjelly
Copy link
Member Author

lawnjelly commented May 21, 2025

core meeting:
We've decided to go with making SceneTreeFTI a friend for now so we don't expose the API as public, and @hpvb will investigate whether we can use a different data structure for holding children in the longer term (but we will merge the approach here for now as it needs to be solved for 4.5).

I'll update the PR asap.

UPDATE:
Done.

@lawnjelly lawnjelly force-pushed the fti_scenetree_faster_children branch from d1a3043 to cbd6c8d Compare May 21, 2025 16:20
Copy link
Member

@Ivorforce Ivorforce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving after changes.

// Not a Node3D.
// Could be e.g. a viewport or something
// so we should still recurse to children.
if (!s) {
for (int n = 0; n < p_node->get_child_count(); n++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I particularly like how this previously called get_child_count() on every loop 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What can I say, it's cheap in 3.x. 😁

@Repiteo Repiteo merged commit 63070dd into godotengine:master May 22, 2025
20 checks passed
@Repiteo
Copy link
Contributor

Repiteo commented May 22, 2025

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants