-
-
Notifications
You must be signed in to change notification settings - Fork 22.8k
Use raw buffer pointers in RenderingDevice
allocation APIs to avoid intermediary arrays
#107486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
RenderingDevice
allocation APIs to avoid forcing callers to allocate intermediary arraysRenderingDevice
allocation APIs to avoid intermediary arrays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use a Span here instead?
I am fine with using raw buffer pointers (in fact, we already do that in a couple of places). But I thought that Span would be the better choice since we want to move towards using it more broadly.
I considered this. |
Wouldn't it be possible to just pass in a |
The original code did pass |
Could we have a special (separate) method for Another possibility if Or does it create too many error checking code possibilities that would slow down the |
I'd prefer not to do this. Imo, |
I think we are talking about different things. lawnjelly and I are both specifically asking about a Span with All of those methods take a GPU allocation size already (either a count or a size in bytes). Then the length of the Vector data is used to determine if the memory should be pre-filled with the data passed in or if the memory should be left uninitialized. With a Span, it should operate the same. If the Span has a zero length, then the As a concrete example, look at
We can easily have a zero-sized Vector/Span, the only thing that can't be 0 is |
Ah, right, that would be the second option I'd offered in the OP (using If this is preferred, I'm happy to swap it around. It should create less code churn, and keep around the safety checks vs the explicit length input. |
I'm not sure its a duplication of information. The p_bytes_size are telling the command how much GPU memory to allocate, while the size of the Vector/Span is implicit to the CPU buffer being passed in and both are needed to validate that the buffer being passed in is the same size as the new allocation. |
The duplication is more obvious when we unpack the void allocate(uint32_t p_size_bytes, Span<uint8_t> p_data) {
ERR_FAIL_COND(p_data.size() != p_size_bytes);
}
// Same as...
void allocate(uint32_t p_expected_size, uint32_t p_actual_size, const uint8_t *p_data) {
ERR_FAIL_COND(p_actual_size != p_expected_size);
} It's unusual to let callers pass an expected value and actual value both. Usually, you just pass the actual value, and it's assumed to be correct. If there's ambiguity, the caller is expected to perform the relevant checks. |
I did in this case actually mean "abusing" the Or some variation that is convertable to / from Now here the exception would be if you e.g. wanted to allocate 1024 bytes on the GPU, but only fill 512 of them from a Anyway perhaps this is more a proposal than for a PR, and there is some backpush for having a |
I think you are still missing how Take a look at The first line is:
We only validate that Expanding that as the following is therefore incorrect: void allocate(uint32_t p_expected_size, uint32_t p_actual_size, const uint8_t *p_data) {
ERR_FAIL_COND(p_actual_size != p_expected_size);
} It should be void allocate(uint32_t p_size_to_allocate, uint32_t p_size_of_supplied_data, const uint8_t *p_data) {
if (p_size_of_supplied_data > 0) {
// Data was supplied, so it needs to be enough to fill the buffer
ERR_FAIL_COND(p_size_to_allocate != p_size_of_supplied_data);
}
} |
Anyway, I think I'm hearing you prefer the |
30579de
to
e45893d
Compare
Done. I needed to make one addition to |
memcpy(data.ptrw(), dxt1_encoding_table, 1024 * 4); | ||
|
||
dxt1_encoding_table_buffer = compress_rd->storage_buffer_create(1024 * 4, data); | ||
dxt1_encoding_table_buffer = compress_rd->storage_buffer_create(1024 * 4, Span(dxt1_encoding_table).reinterpret<uint8_t>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is such a common case, would it be worth having some syntactic sugar like dxt1_encoding_table.byte_span()
or something like that? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that common. It's currently just this exact API that's re-interpreting data like this.
And i like having reinterpret
as part of the name as this is what C++ calls this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really nice now!
e45893d
to
e9cb2c0
Compare
…ermediary arrays on calls.
e9cb2c0
to
2b36c79
Compare
Thanks! |
This change should slightly speed up render allocation calls. I haven't tested practical speed up, and there's some risk of regressions (if I made any mistakes). Editor seems to boot up fine though.
Background
I was looking at
to_byte_array
, because the function seemed fishy to me. If raw data is needed, it's usually to be copied around, and in that case we shouldn't need to make copies.Indeed, I found that all callers of the function were in rendering, used to copy around raw data. I adjusted APIs to use raw pointers instead, which is more idiomatic for this use case.
The one possible downside to this change is that internal calls aren't safety checked for validity (w.r.t. size), because there's no additional
size
being passed on. Personally, this is what I'd expect from such API. Still, it would be possible to change it to aSpan
based API, using something likearray.byte_span()
, passing down an additional size information. I'm open to making this change based on feedback.