Skip to content

Use raw buffer pointers in RenderingDevice allocation APIs to avoid intermediary arrays #107486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 18, 2025

Conversation

Ivorforce
Copy link
Member

This change should slightly speed up render allocation calls. I haven't tested practical speed up, and there's some risk of regressions (if I made any mistakes). Editor seems to boot up fine though.

Background

I was looking at to_byte_array, because the function seemed fishy to me. If raw data is needed, it's usually to be copied around, and in that case we shouldn't need to make copies.

Indeed, I found that all callers of the function were in rendering, used to copy around raw data. I adjusted APIs to use raw pointers instead, which is more idiomatic for this use case.

The one possible downside to this change is that internal calls aren't safety checked for validity (w.r.t. size), because there's no additional size being passed on. Personally, this is what I'd expect from such API. Still, it would be possible to change it to a Span based API, using something like array.byte_span(), passing down an additional size information. I'm open to making this change based on feedback.

@Ivorforce Ivorforce added this to the 4.x milestone Jun 13, 2025
@Ivorforce Ivorforce requested review from a team as code owners June 13, 2025 11:39
@Ivorforce Ivorforce removed the request for review from a team June 13, 2025 11:39
@Ivorforce Ivorforce changed the title Use raw buffer pointers in RenderingDevice allocation APIs to avoid forcing callers to allocate intermediary arrays Use raw buffer pointers in RenderingDevice allocation APIs to avoid intermediary arrays Jun 15, 2025
Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use a Span here instead?

I am fine with using raw buffer pointers (in fact, we already do that in a couple of places). But I thought that Span would be the better choice since we want to move towards using it more broadly.

@Ivorforce
Copy link
Member Author

Would it be possible to use a Span here instead?

I am fine with using raw buffer pointers (in fact, we already do that in a couple of places). But I thought that Span would be the better choice since we want to move towards using it more broadly.

I considered this.
However, the APIs make it possible to pass in an "empty initial value" (Vector() in the old version, nullptr here). This would be impossible with spans, because Span(nullptr, value_above_zero) is incorrect.
An alternative would be having two separate functions; one for "no initial value" (passing the size), and one for "with initial value", taking a Span. Would that be preferable?

@clayjohn
Copy link
Member

However, the APIs make it possible to pass in an "empty initial value" (Vector() in the old version, nullptr here). This would be impossible with spans, because Span(nullptr, value_above_zero) is incorrect.

Wouldn't it be possible to just pass in a LocalVector() to use an empty initial value?

@Ivorforce
Copy link
Member Author

Ivorforce commented Jun 16, 2025

Wouldn't it be possible to just pass in a LocalVector() to use an empty initial value?

The original code did pass Vector() for empty initial values, but also passed in a size separately to define how large the array should be.
We cannot start passing just Span in (without separate size), due to the problem mentioned above.
We could start passing Span (in instead of Vector), but keep the separate size parameter. This would be analogous to the old implementation, but without imposing a Vector should be used. I decided against this, but I'd be open to changing to this approach.

@lawnjelly
Copy link
Member

lawnjelly commented Jun 16, 2025

Could we have a special (separate) method for Span to allow setting nullptr with a size (e.g. set_null(size = 0))? It might be useful in some cases to pass there, then have it checkable.

Another possibility if NULL is not allowed, you could have another special value e.g. 0xFFFFFFFF.

Or does it create too many error checking code possibilities that would slow down the Span ... 🤔

@Ivorforce
Copy link
Member Author

Ivorforce commented Jun 16, 2025

Could we have a special (separate) method for Span to allow setting nullptr with a size (e.g. set_null(size = 0))?

I'd prefer not to do this. Imo, Span should always point to a valid array (or have size 0). Giving up this property would be unfortunate semantically — apart from potential added ifs, it would also necessitate reverting #107444. (Edit: I guess your separate factory constructor would work, but still, having Span have this special case makes me feel uneasy)

@clayjohn
Copy link
Member

clayjohn commented Jun 16, 2025

Could we have a special (separate) method for Span to allow setting nullptr with a size (e.g. set_null(size = 0))?

I'd prefer not to do this. Imo, Span should always point to a valid array (or have size 0). Giving up this property would be unfortunate semantically — apart from potential added ifs, it would also necessitate reverting #107444. (Edit: I guess your separate factory constructor would work, but still, having Span have this special case makes me feel uneasy)

I think we are talking about different things. lawnjelly and I are both specifically asking about a Span with nullptr data and a length of 0. We are not asking about Spans with nullptr data and a greater than 0 length.

All of those methods take a GPU allocation size already (either a count or a size in bytes). Then the length of the Vector data is used to determine if the memory should be pre-filled with the data passed in or if the memory should be left uninitialized. With a Span, it should operate the same. If the Span has a zero length, then the p_data argument is ignored, otherwise, check to see if the Span length matches the p_size_in_bytes and then fill the GPU allocation with the data in p_data

As a concrete example, look at texture_buffer_create

RID texture_buffer_create(uint32_t p_size_elements, DataFormat p_format, const Vector<uint8_t> &p_data = Vector<uint8_t>());

We can easily have a zero-sized Vector/Span, the only thing that can't be 0 is p_size_elements

@Ivorforce
Copy link
Member Author

Ivorforce commented Jun 16, 2025

I think we are talking about different things. lawnjelly and I are both specifically asking about a Span with nullptr data and a length of 0. We are not asking about Spans with nullptr data and a greater than 0 length.

Ah, right, that would be the second option I'd offered in the OP (using byte_span) and above . Sorry about the confusion!

If this is preferred, I'm happy to swap it around. It should create less code churn, and keep around the safety checks vs the explicit length input.
The reason I didn't to this in the first place is to avoid information duplication, but it's really just a matter of taste imo.

@clayjohn
Copy link
Member

If this is preferred, I'm happy to swap it around. It should create less code churn, and keep around the safety checks vs the explicit length input. The reason I didn't to this in the first place is to avoid information duplication, but it's really just a matter of taste imo.

I'm not sure its a duplication of information. The p_bytes_size are telling the command how much GPU memory to allocate, while the size of the Vector/Span is implicit to the CPU buffer being passed in and both are needed to validate that the buffer being passed in is the same size as the new allocation.

@Ivorforce
Copy link
Member Author

Ivorforce commented Jun 16, 2025

If this is preferred, I'm happy to swap it around. It should create less code churn, and keep around the safety checks vs the explicit length input. The reason I didn't to this in the first place is to avoid information duplication, but it's really just a matter of taste imo.

I'm not sure its a duplication of information. The p_bytes_size are telling the command how much GPU memory to allocate, while the size of the Vector/Span is implicit to the CPU buffer being passed in and both are needed to validate that the buffer being passed in is the same size as the new allocation.

The duplication is more obvious when we unpack the Span and leadingly rename the variables:

void allocate(uint32_t p_size_bytes, Span<uint8_t> p_data) {
    ERR_FAIL_COND(p_data.size() != p_size_bytes);
}
// Same as...
void allocate(uint32_t p_expected_size, uint32_t p_actual_size, const uint8_t *p_data) {
    ERR_FAIL_COND(p_actual_size != p_expected_size);
}

It's unusual to let callers pass an expected value and actual value both. Usually, you just pass the actual value, and it's assumed to be correct. If there's ambiguity, the caller is expected to perform the relevant checks.
I hope that clears up what I mean. That being said, I'm not opposed to using Span, if it keeps friction low.

@lawnjelly
Copy link
Member

I think we are talking about different things. lawnjelly and I are both specifically asking about a Span with nullptr data and a length of 0. We are not asking about Spans with nullptr data and a greater than 0 length.

I did in this case actually mean "abusing" the Span to be able to pass more information... I'm realizing that it might be a more general case than just here where we want to pass a single span, but still offer some special values, to avoid passing extra params (or in this case avoiding passing a size).

Or some variation that is convertable to / from Span like a template class that Span is derived from that explicitly allows certain sentinel values (say zero page, allows NULL). Come to think of it, we might want to use Span (or span like) to pass offsets and size, in which case NULL should be allowable for such a structure.

Now here the exception would be if you e.g. wanted to allocate 1024 bytes on the GPU, but only fill 512 of them from a Span, in which case the combination of Span plus allocate size would make sense (do we do this?).

Anyway perhaps this is more a proposal than for a PR, and there is some backpush for having a Span that differs from e.g. STL.

@clayjohn
Copy link
Member

If this is preferred, I'm happy to swap it around. It should create less code churn, and keep around the safety checks vs the explicit length input. The reason I didn't to this in the first place is to avoid information duplication, but it's really just a matter of taste imo.

I'm not sure its a duplication of information. The p_bytes_size are telling the command how much GPU memory to allocate, while the size of the Vector/Span is implicit to the CPU buffer being passed in and both are needed to validate that the buffer being passed in is the same size as the new allocation.

The duplication is more obvious when we unpack the Span and leadingly rename the variables:

void allocate(uint32_t p_size_bytes, Span<uint8_t> p_data) {
    ERR_FAIL_COND(p_data.size() != p_size_bytes);
}
// Same as...
void allocate(uint32_t p_expected_size, uint32_t p_actual_size, const uint8_t *p_data) {
    ERR_FAIL_COND(p_actual_size != p_expected_size);
}

It's unusual to let callers pass an expected value and actual value both. Usually, you just pass the actual value, and it's assumed to be correct. If there's ambiguity, the caller is expected to perform the relevant checks. I hope that clears up what I mean. That being said, I'm not opposed to using Span, if it keeps friction low.

I think you are still missing how p_data is used in the function.

Take a look at vertex_buffer_create() for example:

The first line is:

ERR_FAIL_COND_V(p_data.size() && (uint32_t)p_data.size() != p_size_bytes, RID());

We only validate that p_data.size() is equal to p_size_bytes if p_data.size() is greater than zero. That's because it is a perfectly valid to request a vertex buffer with non-zero size, but pass in an empty Vector because you aren't initializing the data.

Expanding that as the following is therefore incorrect:

void allocate(uint32_t p_expected_size, uint32_t p_actual_size, const uint8_t *p_data) {
    ERR_FAIL_COND(p_actual_size != p_expected_size);
}

It should be

void allocate(uint32_t p_size_to_allocate, uint32_t p_size_of_supplied_data, const uint8_t *p_data) {
    if (p_size_of_supplied_data > 0) {
        // Data was supplied, so it needs to be enough to fill the buffer
        ERR_FAIL_COND(p_size_to_allocate != p_size_of_supplied_data);
    }
}

@Ivorforce
Copy link
Member Author

Anyway, I think I'm hearing you prefer the Span based version, so I'll update the PR accordingly!

@Ivorforce Ivorforce force-pushed the rendering-no-alloc-api branch from 30579de to e45893d Compare June 17, 2025 15:33
@Ivorforce Ivorforce requested a review from a team as a code owner June 17, 2025 15:33
@Ivorforce
Copy link
Member Author

Anyway, I think I'm hearing you prefer the Span based version, so I'll update the PR accordingly!

Done.
This change allowed me to scale back the changes, down to those that are likely to have an actual impact (data was copied for the call).

I needed to make one addition to Span: .reinterpret<uint8_t> is needed to be able to conveniently interface with Vector<SomethingElse> and still pass to the uint8_t interfaces using vector.span().reinterpret<uint8_t>().

memcpy(data.ptrw(), dxt1_encoding_table, 1024 * 4);

dxt1_encoding_table_buffer = compress_rd->storage_buffer_create(1024 * 4, data);
dxt1_encoding_table_buffer = compress_rd->storage_buffer_create(1024 * 4, Span(dxt1_encoding_table).reinterpret<uint8_t>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is such a common case, would it be worth having some syntactic sugar like dxt1_encoding_table.byte_span() or something like that? 🤔

Copy link
Member Author

@Ivorforce Ivorforce Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that common. It's currently just this exact API that's re-interpreting data like this.
And i like having reinterpret as part of the name as this is what C++ calls this.

Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really nice now!

@clayjohn clayjohn modified the milestones: 4.x, 4.5 Jun 18, 2025
@Ivorforce Ivorforce force-pushed the rendering-no-alloc-api branch from e45893d to e9cb2c0 Compare June 18, 2025 10:23
@Ivorforce Ivorforce force-pushed the rendering-no-alloc-api branch from e9cb2c0 to 2b36c79 Compare June 18, 2025 10:31
@Repiteo Repiteo merged commit ac6252c into godotengine:master Jun 18, 2025
20 checks passed
@Repiteo
Copy link
Contributor

Repiteo commented Jun 18, 2025

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants