Skip to content

Minor Optimization to Occlusion Culling #107839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Rudolph-B
Copy link
Contributor

Minor performance improvement to occlusion culling as well as a partial fix for #106184 (I'm working on something more comprehensive).

The performance gain comes from swapping one version of Projection::xform for another that is inlined. The two functions aren't a 1-to-1 match, so a minor rework the logic was required. After further testing, I believe the issue of small objects self-occluding (as described by @JFonS in #52545) was fully resolved in #94210. I’ve therefore updated the logic to be less conservative, resulting in a small improvement in occlusion rate which can be seen in the video below. The test project is similar to the MRP in #106184. The camera is placed inside a box occluder, so ideally, everything should become occluded.

output.mp4

Regarding the performance improvements, I tested using occlusion_culling_mesh_lod. On my machine, it went from roughly ~1.2 ms/frame (~830 FPS) to ~1.05 ms/frame (~950 FPS). Since this project is specifically designed to be bottlenecked by occlusion culling, I also tested using the tps-demo. However, I wasn’t able to observe any noticeable performance improvement (~122 FPS before and after).

Godot v4.5.beta (b3d858f) - Ubuntu 24.04.2 LTS 24.04 on X11 - X11 display driver, Multi-window, 2 monitors - Vulkan (Forward+) - dedicated NVIDIA GeForce RTX 2070 SUPER (nvidia; 550.144.03) - AMD Ryzen 7 3700X 8-Core Processor (16 threads) - 14.76 GiB memory

Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, it works as expected. Code looks good to me.

I spotted a very slight performance improvement (from 2480 FPS to 2498 FPS with a 64×64 viewport to test for CPU bottlenecks) with a Linux x86_64 release optimized export template binary. It's consistently faster even over several minutes and across multiple runs, just a very slight increase. In real world projects, this change is probably more about saving power than actually increasing FPS (although remember that on mobile devices, lower CPU power usage can let the GPU use more power, therefore increasing FPS in GPU-bound scenes).

PC specifications
  • CPU: AMD Ryzen 9 9950X3D
  • GPU: NVIDIA GeForce RTX 5090
  • RAM: 64 GB (2×32 GB DDR5-6000 CL30)
  • SSD: Solidigm P44 Pro 2 TB
  • OS: Linux (Fedora 42)

@Rudolph-B Rudolph-B marked this pull request as draft July 8, 2025 11:43
@Rudolph-B
Copy link
Contributor Author

Converted to draft for now. There are some additional optimizations I want make but they will only make sense after #108347 has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants