-
-
Notifications
You must be signed in to change notification settings - Fork 22.8k
SceneTreeFTI
faster access to Node
children
#106224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SceneTreeFTI
faster access to Node
children
#106224
Conversation
c7e3878
to
8b42e49
Compare
8b42e49
to
312090d
Compare
312090d
to
d1a3043
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me. We desperately need a high performance access to node children.
Using Span
for this makes sense to me since it allows to not expose which container is used internally, and it has the lowest chance of accidental overhead.
Code looks good!
Incidentally, it would also be possible I think to use a custom e.g. |
core meeting: I'll update the PR asap. UPDATE: |
d1a3043
to
cbd6c8d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-approving after changes.
// Not a Node3D. | ||
// Could be e.g. a viewport or something | ||
// so we should still recurse to children. | ||
if (!s) { | ||
for (int n = 0; n < p_node->get_child_count(); n++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I particularly like how this previously called get_child_count()
on every loop 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What can I say, it's cheap in 3.x. 😁
Thanks! |
Adds methods to directly access cached children from outside
Node
:update_cached_children()
get_cached_children()
Helps address #106185.
Supersedes #106214.
MRP
ManyStaticColliders2.zip
Before this PR: 160fps
After this PR: 250fps
Physics interpolation off: 360fps
Discussion
#104269 introduced the new 3D FTI and uses a reference approach traversing the entire scene tree which is expected to not be super fast with large numbers of nodes in the tree. This is to be considerably optimized in #105728 (which only updates branches that have changed) and #105901.
However the reason for a lot of the slowdown in #106185 was unexpected - it was due to changes made for 4.x in #75627 which seem to make it (very?) inefficient to access children via
get_child_count()
andget_child()
. This problem does not exist in 3.x.This is worth optimizing because although after #105728 this MRP will no longer be an issue for static nodes, the bottleneck will still exist when moving large numbers of nodes (e.g. via a parent).
Profiling makes clear what is happening (this is debug, but should represent release to an extent):
Investigating suggested that
get_child_count()
andget_child()
could be very inefficient, especially in debug with thread checks.Notes
get_child()
andget_child_count()
.get_child()
is slow because it includes a thread guard, updates child cache every call, branches on internal and index, and includes bounds checking.get_child_count()
is especially slow in for loops because it includes thread guard, update to child cache, branches on internal, and math.SceneTreeFTI
afriend
and accessing the cached children directly.LocalVector
directly versusSpan
, andSpan
is optimized out and runs as fast as the former, so is probably a more generic solution than returning theLocalVector
.child_count
in a local variable is an easy win in for loops.I don't have strong opinions as to which method would be best for access:
friend
discourages use in other areas of the engine (especially while evaluating whether this will work in the wild)friend
is usually better to use sparingly, because it makes inter dependencies in the engine less clearpublic
methods allow use from other parts of the enginesize
and[]
instead of returning aSpan