Use raw buffer pointers in `RenderingDevice` allocation APIs to avoid intermediary arrays #107486

Ivorforce · 2025-06-13T11:39:00Z

This change should slightly speed up render allocation calls. I haven't tested practical speed up, and there's some risk of regressions (if I made any mistakes). Editor seems to boot up fine though.

Background

I was looking at to_byte_array, because the function seemed fishy to me. If raw data is needed, it's usually to be copied around, and in that case we shouldn't need to make copies.

Indeed, I found that all callers of the function were in rendering, used to copy around raw data. I adjusted APIs to use raw pointers instead, which is more idiomatic for this use case.

The one possible downside to this change is that internal calls aren't safety checked for validity (w.r.t. size), because there's no additional size being passed on. Personally, this is what I'd expect from such API. Still, it would be possible to change it to a Span based API, using something like array.byte_span(), passing down an additional size information. I'm open to making this change based on feedback.

clayjohn

Would it be possible to use a Span here instead?

I am fine with using raw buffer pointers (in fact, we already do that in a couple of places). But I thought that Span would be the better choice since we want to move towards using it more broadly.

Ivorforce · 2025-06-16T09:29:29Z

Would it be possible to use a Span here instead?

I am fine with using raw buffer pointers (in fact, we already do that in a couple of places). But I thought that Span would be the better choice since we want to move towards using it more broadly.

I considered this.
However, the APIs make it possible to pass in an "empty initial value" (Vector() in the old version, nullptr here). This would be impossible with spans, because Span(nullptr, value_above_zero) is incorrect.
An alternative would be having two separate functions; one for "no initial value" (passing the size), and one for "with initial value", taking a Span. Would that be preferable?

clayjohn · 2025-06-16T14:55:11Z

However, the APIs make it possible to pass in an "empty initial value" (Vector() in the old version, nullptr here). This would be impossible with spans, because Span(nullptr, value_above_zero) is incorrect.

Wouldn't it be possible to just pass in a LocalVector() to use an empty initial value?

Ivorforce · 2025-06-16T15:00:36Z

Wouldn't it be possible to just pass in a LocalVector() to use an empty initial value?

The original code did pass Vector() for empty initial values, but also passed in a size separately to define how large the array should be.
We cannot start passing just Span in (without separate size), due to the problem mentioned above.
We could start passing Span (in instead of Vector), but keep the separate size parameter. This would be analogous to the old implementation, but without imposing a Vector should be used. I decided against this, but I'd be open to changing to this approach.

lawnjelly · 2025-06-16T17:18:36Z

Could we have a special (separate) method for Span to allow setting nullptr with a size (e.g. set_null(size = 0))? It might be useful in some cases to pass there, then have it checkable.

Another possibility if NULL is not allowed, you could have another special value e.g. 0xFFFFFFFF.

Or does it create too many error checking code possibilities that would slow down the Span ... 🤔

Ivorforce · 2025-06-16T17:27:12Z

Could we have a special (separate) method for Span to allow setting nullptr with a size (e.g. set_null(size = 0))?

I'd prefer not to do this. Imo, Span should always point to a valid array (or have size 0). Giving up this property would be unfortunate semantically — apart from potential added ifs, it would also necessitate reverting #107444. (Edit: I guess your separate factory constructor would work, but still, having Span have this special case makes me feel uneasy)

clayjohn · 2025-06-16T18:04:22Z

Could we have a special (separate) method for Span to allow setting nullptr with a size (e.g. set_null(size = 0))?

I'd prefer not to do this. Imo, Span should always point to a valid array (or have size 0). Giving up this property would be unfortunate semantically — apart from potential added ifs, it would also necessitate reverting #107444. (Edit: I guess your separate factory constructor would work, but still, having Span have this special case makes me feel uneasy)

I think we are talking about different things. lawnjelly and I are both specifically asking about a Span with nullptr data and a length of 0. We are not asking about Spans with nullptr data and a greater than 0 length.

All of those methods take a GPU allocation size already (either a count or a size in bytes). Then the length of the Vector data is used to determine if the memory should be pre-filled with the data passed in or if the memory should be left uninitialized. With a Span, it should operate the same. If the Span has a zero length, then the p_data argument is ignored, otherwise, check to see if the Span length matches the p_size_in_bytes and then fill the GPU allocation with the data in p_data

As a concrete example, look at texture_buffer_create

RID texture_buffer_create(uint32_t p_size_elements, DataFormat p_format, const Vector<uint8_t> &p_data = Vector<uint8_t>());

We can easily have a zero-sized Vector/Span, the only thing that can't be 0 is p_size_elements

Ivorforce · 2025-06-16T19:52:16Z

I think we are talking about different things. lawnjelly and I are both specifically asking about a Span with nullptr data and a length of 0. We are not asking about Spans with nullptr data and a greater than 0 length.

Ah, right, that would be the second option I'd offered in the OP (using byte_span) and above . Sorry about the confusion!

If this is preferred, I'm happy to swap it around. It should create less code churn, and keep around the safety checks vs the explicit length input.
The reason I didn't to this in the first place is to avoid information duplication, but it's really just a matter of taste imo.

clayjohn · 2025-06-16T20:45:33Z

If this is preferred, I'm happy to swap it around. It should create less code churn, and keep around the safety checks vs the explicit length input. The reason I didn't to this in the first place is to avoid information duplication, but it's really just a matter of taste imo.

I'm not sure its a duplication of information. The p_bytes_size are telling the command how much GPU memory to allocate, while the size of the Vector/Span is implicit to the CPU buffer being passed in and both are needed to validate that the buffer being passed in is the same size as the new allocation.

Ivorforce · 2025-06-16T23:28:58Z

If this is preferred, I'm happy to swap it around. It should create less code churn, and keep around the safety checks vs the explicit length input. The reason I didn't to this in the first place is to avoid information duplication, but it's really just a matter of taste imo.

I'm not sure its a duplication of information. The p_bytes_size are telling the command how much GPU memory to allocate, while the size of the Vector/Span is implicit to the CPU buffer being passed in and both are needed to validate that the buffer being passed in is the same size as the new allocation.

The duplication is more obvious when we unpack the Span and leadingly rename the variables:

void allocate(uint32_t p_size_bytes, Span<uint8_t> p_data) {
    ERR_FAIL_COND(p_data.size() != p_size_bytes);
}
// Same as...
void allocate(uint32_t p_expected_size, uint32_t p_actual_size, const uint8_t *p_data) {
    ERR_FAIL_COND(p_actual_size != p_expected_size);
}

It's unusual to let callers pass an expected value and actual value both. Usually, you just pass the actual value, and it's assumed to be correct. If there's ambiguity, the caller is expected to perform the relevant checks.
I hope that clears up what I mean. That being said, I'm not opposed to using Span, if it keeps friction low.

lawnjelly · 2025-06-17T03:07:35Z

I think we are talking about different things. lawnjelly and I are both specifically asking about a Span with nullptr data and a length of 0. We are not asking about Spans with nullptr data and a greater than 0 length.

I did in this case actually mean "abusing" the Span to be able to pass more information... I'm realizing that it might be a more general case than just here where we want to pass a single span, but still offer some special values, to avoid passing extra params (or in this case avoiding passing a size).

Or some variation that is convertable to / from Span like a template class that Span is derived from that explicitly allows certain sentinel values (say zero page, allows NULL). Come to think of it, we might want to use Span (or span like) to pass offsets and size, in which case NULL should be allowable for such a structure.

Now here the exception would be if you e.g. wanted to allocate 1024 bytes on the GPU, but only fill 512 of them from a Span, in which case the combination of Span plus allocate size would make sense (do we do this?).

Anyway perhaps this is more a proposal than for a PR, and there is some backpush for having a Span that differs from e.g. STL.

clayjohn · 2025-06-17T05:39:51Z

If this is preferred, I'm happy to swap it around. It should create less code churn, and keep around the safety checks vs the explicit length input. The reason I didn't to this in the first place is to avoid information duplication, but it's really just a matter of taste imo.

I'm not sure its a duplication of information. The p_bytes_size are telling the command how much GPU memory to allocate, while the size of the Vector/Span is implicit to the CPU buffer being passed in and both are needed to validate that the buffer being passed in is the same size as the new allocation.

The duplication is more obvious when we unpack the Span and leadingly rename the variables:
void allocate(uint32_t p_size_bytes, Span<uint8_t> p_data) {
    ERR_FAIL_COND(p_data.size() != p_size_bytes);
}
// Same as...
void allocate(uint32_t p_expected_size, uint32_t p_actual_size, const uint8_t *p_data) {
    ERR_FAIL_COND(p_actual_size != p_expected_size);
}
It's unusual to let callers pass an expected value and actual value both. Usually, you just pass the actual value, and it's assumed to be correct. If there's ambiguity, the caller is expected to perform the relevant checks. I hope that clears up what I mean. That being said, I'm not opposed to using Span, if it keeps friction low.

I think you are still missing how p_data is used in the function.

Take a look at vertex_buffer_create() for example:

The first line is:

ERR_FAIL_COND_V(p_data.size() && (uint32_t)p_data.size() != p_size_bytes, RID());

We only validate that p_data.size() is equal to p_size_bytes if p_data.size() is greater than zero. That's because it is a perfectly valid to request a vertex buffer with non-zero size, but pass in an empty Vector because you aren't initializing the data.

Expanding that as the following is therefore incorrect:

void allocate(uint32_t p_expected_size, uint32_t p_actual_size, const uint8_t *p_data) {
    ERR_FAIL_COND(p_actual_size != p_expected_size);
}

It should be

void allocate(uint32_t p_size_to_allocate, uint32_t p_size_of_supplied_data, const uint8_t *p_data) {
    if (p_size_of_supplied_data > 0) {
        // Data was supplied, so it needs to be enough to fill the buffer
        ERR_FAIL_COND(p_size_to_allocate != p_size_of_supplied_data);
    }
}

Ivorforce · 2025-06-17T09:28:25Z

Anyway, I think I'm hearing you prefer the Span based version, so I'll update the PR accordingly!

Ivorforce · 2025-06-17T15:35:26Z

Anyway, I think I'm hearing you prefer the Span based version, so I'll update the PR accordingly!

Done.
This change allowed me to scale back the changes, down to those that are likely to have an actual impact (data was copied for the call).

I needed to make one addition to Span: .reinterpret<uint8_t> is needed to be able to conveniently interface with Vector<SomethingElse> and still pass to the uint8_t interfaces using vector.span().reinterpret<uint8_t>().

lawnjelly · 2025-06-17T16:24:19Z

modules/betsy/image_compress_betsy.cpp

-		memcpy(data.ptrw(), dxt1_encoding_table, 1024 * 4);
-
-		dxt1_encoding_table_buffer = compress_rd->storage_buffer_create(1024 * 4, data);
+		dxt1_encoding_table_buffer = compress_rd->storage_buffer_create(1024 * 4, Span(dxt1_encoding_table).reinterpret<uint8_t>());


As this is such a common case, would it be worth having some syntactic sugar like dxt1_encoding_table.byte_span() or something like that? 🤔

It's not that common. It's currently just this exact API that's re-interpreting data like this.
And i like having reinterpret as part of the name as this is what C++ calls this.

clayjohn

This looks really nice now!

servers/rendering/renderer_rd/storage_rd/texture_storage.cpp

…ermediary arrays on calls.

Repiteo · 2025-06-18T23:31:15Z

Thanks!

Ivorforce added this to the 4.x milestone Jun 13, 2025

Ivorforce requested review from a team as code owners June 13, 2025 11:39

Ivorforce added enhancement topic:rendering performance labels Jun 13, 2025

Ivorforce removed the request for review from a team June 13, 2025 11:39

Ivorforce changed the title ~~Use raw buffer pointers in RenderingDevice allocation APIs to avoid forcing callers to allocate intermediary arrays~~ Use raw buffer pointers in RenderingDevice allocation APIs to avoid intermediary arrays Jun 15, 2025

clayjohn reviewed Jun 16, 2025

View reviewed changes

Ivorforce force-pushed the rendering-no-alloc-api branch from 30579de to e45893d Compare June 17, 2025 15:33

Ivorforce requested a review from a team as a code owner June 17, 2025 15:33

lawnjelly reviewed Jun 17, 2025

View reviewed changes

clayjohn approved these changes Jun 18, 2025

View reviewed changes

clayjohn modified the milestones: 4.x, 4.5 Jun 18, 2025

lawnjelly approved these changes Jun 18, 2025

View reviewed changes

Ivorforce force-pushed the rendering-no-alloc-api branch from e45893d to e9cb2c0 Compare June 18, 2025 10:23

AThousandShips reviewed Jun 18, 2025

View reviewed changes

servers/rendering/renderer_rd/storage_rd/texture_storage.cpp Outdated Show resolved Hide resolved

Use Span<uint8_t> in RenderingDevice allocation APIs to avoid int…

2b36c79

…ermediary arrays on calls.

Ivorforce force-pushed the rendering-no-alloc-api branch from e9cb2c0 to 2b36c79 Compare June 18, 2025 10:31

AThousandShips approved these changes Jun 18, 2025

View reviewed changes

Repiteo merged commit ac6252c into godotengine:master Jun 18, 2025
20 checks passed

Ivorforce deleted the rendering-no-alloc-api branch June 18, 2025 23:39

This was referenced Jun 21, 2025

Fix buffer size calculations in lightmapper_rd.cpp to account for element sizes #107820

Merged

Unable to bake lightmaps on newest master build #107805

Closed

Uh oh!

Use raw buffer pointers in RenderingDevice allocation APIs to avoid intermediary arrays #107486

Use raw buffer pointers in RenderingDevice allocation APIs to avoid intermediary arrays #107486

Uh oh!

Conversation

Ivorforce commented Jun 13, 2025

Background

Uh oh!

clayjohn left a comment

Choose a reason for hiding this comment

Uh oh!

Ivorforce commented Jun 16, 2025

Uh oh!

clayjohn commented Jun 16, 2025

Uh oh!

Ivorforce commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lawnjelly commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ivorforce commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clayjohn commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ivorforce commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clayjohn commented Jun 16, 2025

Uh oh!

Ivorforce commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lawnjelly commented Jun 17, 2025

Uh oh!

clayjohn commented Jun 17, 2025

Uh oh!

Ivorforce commented Jun 17, 2025

Uh oh!

Ivorforce commented Jun 17, 2025

Uh oh!

lawnjelly Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

Ivorforce Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clayjohn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Repiteo commented Jun 18, 2025

Uh oh!

Uh oh!

Use raw buffer pointers in `RenderingDevice` allocation APIs to avoid intermediary arrays #107486

Use raw buffer pointers in `RenderingDevice` allocation APIs to avoid intermediary arrays #107486

Ivorforce commented Jun 16, 2025 •

edited

Loading

lawnjelly commented Jun 16, 2025 •

edited

Loading

Ivorforce commented Jun 16, 2025 •

edited

Loading

clayjohn commented Jun 16, 2025 •

edited

Loading

Ivorforce commented Jun 16, 2025 •

edited

Loading

Ivorforce commented Jun 16, 2025 •

edited

Loading

Ivorforce Jun 17, 2025 •

edited

Loading