Description
Describe the project you are working on
Large scale terrain editing with lots of dense foliage and vegetation using particle systems.
Describe the problem or limitation you are having in your project
Zero initialization of any GPU buffers is slow due to inefficient memory reallocation. This is especially true in particle systems when changing the amount of particles emitted, which causes the system to restart, and get mapped to the CPU twice.
Describe the feature / enhancement and how it helps to overcome the problem or limitation
Zero initialization should be done on the GPU in a generic transform feedback utility to avoid stalling the pipeline.
I noticed this while deeply studying the particle systems for other optimizations. I noticed that the generic method maps the buffer to the CPU only to zero initialize, which stalls the pipeline making it much slower than it should be.
https://github.com/godotengine/godot/blob/0ed1c192e87e107f507d41dae00f9859bcc45ef1/drivers/gles3/storage/particles_storage.cpp#L883
buffer_allocate_data
needsgpu_buffer_allocate_data
twin (Compatibility)storage_buffer_create
needsgpu_storage_buffer_create
twin (Forward+)
This is also doubly bad because after this it will run the process code and updating it a second time anyway. So basically the whole first mapping to the CPU was redundant, but has a huge performance impact.
Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams
GLES3/Compatibility will use a generic transform feedback when we only want to zero init a buffer. Forward+ can do the same or implement a zero init utility function in a compute shader.
If this enhancement will not be used often, can it be worked around with a few lines of script?
No because it's abstracted from the end user and almost any buffer/multimesh or particles will exhibit the problem in some way.
Is there a reason why this should be core and not an add-on in the asset library?
We should definitely do this, it's pretty standard to have async GPU memory init without stalling. It's also very simple for us to implement. This will actually make loading faster for the editor/game/buffers/particles and not just the dynamic changes.
It's a win win all around.