Skip to content

DataTexture: Proposal to support partial update #30184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
agargaro opened this issue Dec 21, 2024 · 21 comments
Open

DataTexture: Proposal to support partial update #30184

agargaro opened this issue Dec 21, 2024 · 21 comments

Comments

@agargaro
Copy link
Contributor

agargaro commented Dec 21, 2024

Description

Hi!

I am using DataTexture quite a lot to handle data with BatchedMesh and InstancedMesh2 (InstancedMesh + indirection).

In my case, I would like to update the color of only one instance (on the mouse over), but send the gpu the whole texture is expensive because it's very large.

I tried using the WebGLRenderer.copyTextureToTexture method but it doesn't work when src and dest are the same texture (I might open a separate bug for this).

Anyway, this method is useless if BatchedMesh automatically updates .needsUpdate flag which will make the whole texture update anyway.

It would be fantastic to have a partial update system like BufferAttribute.

I know that it's an important change, but if you want I can help.

Thank you for all the work you do. 😄

Solution

Implementing an addUpdateRange method similar to that of BufferAttribute.

.addUpdateRange ( region : Box2 ) : this

Alternatives

Fix and use WebGLRenderer.copyTextureToTexture, but we should remove .needsUpdate = true from BatchedMesh?

Additional context

No response

@RenaudRohlinger
Copy link
Collaborator

Nice to keep this idea on the table. So essentially, adding x, y, width, height parameters to [compressed]texSubImage(2D/3D).

By the way for your specific case, you could take this a step further by avoiding CPU-to-GPU stalls entirely through the use of Pixel Buffer Objects (PBOs). Adding a new API in Three.js that would be designed to enable asynchronous data transfers by leveraging gl.PIXEL_UNPACK_BUFFER. Instead of directly interacting with the GPU as we currently do, the CPU would write data to the PBO, which acts as a staging area, and the GPU processes the buffer asynchronously. This would eliminate blocking and probably improves by a lot your advanced BatchedMesh performances.

For example:

// Bind the PBO for writing (created on the first upload with `texImage2D` for example)
gl.bindBuffer(gl.PIXEL_UNPACK_BUFFER, pbo);

// Write pixel data to the PBO
const pixelData = new Uint8Array(bufferSize); // Your pixel data
gl.bufferSubData(gl.PIXEL_UNPACK_BUFFER, 0, pixelData);

// Upload the data from the PBO to the texture
gl.bindTexture(gl.TEXTURE_2D, texture);
gl.texSubImage2D(
    gl.TEXTURE_2D,
    0,               // Mipmap level
    0, 0,            // x and y offsets
    textureWidth,     // Width of the sub-image
    textureHeight,    // Height of the sub-image
    gl.RGBA,          // Format
    gl.UNSIGNED_BYTE, // Type
    0                 // Offset in the PBO
);

// Unbind the PBO
gl.bindBuffer(gl.PIXEL_UNPACK_BUFFER, null);

Basically:
No PBO:

CPU --> GPU (Direct transfer)
         [GPU busy  CPU stalls]

PBO:

CPU --> PBO (Asynchronous transfer)
PBO --> GPU (GPU reads later when ready)
         [CPU free  No stall]

Basically, the PBO serves as a staging area in GPU-accessible memory. Pixel data is first copied into the PBO, and the GPU reads from it at a later time. Spamming DataTexture update wouldn't stall the CPU anymore, adding to the command queue of the Pixel Buffer Object instead, and get processed asynchronously.

@gkjohnson
Copy link
Collaborator

gkjohnson commented Dec 22, 2024

I think this would be a great addition - even for any texture. Uploading the whole image for just a single pixel is a waist. A couple comments - I think the signature can be just a single box, right? As in .addUpdateRange ( box : Box2 )

I tried using the WebGLRenderer.copyTextureToTexture method but it doesn't work when src and dest are the same texture (I might open a separate bug for this).

I think the copyTextureToTexture mental model is different to what you're interested in. Due to copying operations the contents of a texture can become out-of-sync with the gpu-memory representation and copyTextureToTexture is designed to copy between gpu-memory instances. Adding addUpdateRange seems like a good, API-consistent addition.

I'm very supportive of this addition.

@RenaudRohlinger

Adding a new API in Three.js that would be designed to enable asynchronous data transfers by leveraging gl.PIXEL_UNPACK_BUFFER. Instead of directly interacting with the GPU as we currently do, the CPU would write data to the PBO, which acts as a staging area, and the GPU processes the buffer asynchronously.

This is interesting. Is there a reason to not always do this when uploading DataTextures? It seems like a strict improvement without drawbacks? Does it need a new three.js API?

@RenaudRohlinger
Copy link
Collaborator

RenaudRohlinger commented Dec 22, 2024

This implies an additional array buffer (or 2, one write and another one read to prevent block in lockstep) per DataTexture at initialization:

// Create a PBO
const pbo = gl.createBuffer();

// Bind it to PIXEL_UNPACK_BUFFER
gl.bindBuffer(gl.PIXEL_UNPACK_BUFFER, pbo);

// Allocate storage (size in bytes)
const bufferSize = textureWidth * textureHeight * 4; // Assuming RGBA, 4 bytes per pixel
gl.bufferData(gl.PIXEL_UNPACK_BUFFER, bufferSize, gl.STREAM_DRAW);

// Unbind the PBO
gl.bindBuffer(gl.PIXEL_UNPACK_BUFFER, null);

So I guess we would need to consider the performance benefit/cost before thinking about adding this to the core. But I agree that it sounds like a direct improvement.

@gkjohnson
Copy link
Collaborator

It's true that this would cause unnecessary resource creation. I think PBO support for faster uploads can be considered in a separate issue - perhaps a flag can be added to enabled / disable this use.

Before any work is done on this I think it would be good to get opinions from @Mugen87 or @mrdoob. But I think this is a good change.

@agargaro
Copy link
Contributor Author

I think the signature can be just a single box, right? As in .addUpdateRange ( box : Box2 )

Yes, sorry. I edited the post.

@RenaudRohlinger thanks for the information!

@Nmzik
Copy link
Contributor

Nmzik commented Feb 4, 2025

Just to let you know, texSubImage2Dwith the D3D11 backend on Windows can trigger CPU readback, meaning it copies buffer memory from the GPU back to the CPU before copying it to the texture again (if supportsFastCopyBufferToTexture returns false). I remember trying to partially update a large (4096x4096) texture via PBO but didn't see any performance improvements—there were still framerate hitches. Again, I could be completely wrong, but that's what I see in the source code of the ANGLE project. Also, for future reference, compressedTexSubImage2D will ALWAYS trigger CPU readback from PBO Link

Remark: I remember that WebGL calls don't directly invoke the OS graphics API. Instead, they are added to a ring buffer that is processed by a separate thread in the browser's renderer thread. So, the texSubImage2D command may return immediately on the JavaScript side but still perform a GPU-CPU copy in a separate thread. Then again, I might be overthinking it...

bool Renderer11::supportsFastCopyBufferToTexture(GLenum internalFormat) const
{
    const gl::InternalFormat &internalFormatInfo = gl::GetSizedInternalFormatInfo(internalFormat);
    const d3d11::Format &d3d11FormatInfo =
        d3d11::Format::Get(internalFormat, mRenderer11DeviceCaps);

    // sRGB formats do not work with D3D11 buffer SRVs
    if (internalFormatInfo.colorEncoding == GL_SRGB)
    {
        return false;
    }

    // We cannot support direct copies to non-color-renderable formats
    if (d3d11FormatInfo.rtvFormat == DXGI_FORMAT_UNKNOWN)
    {
        return false;
    }

    // We skip all 3-channel formats since sometimes format support is missing
    if (internalFormatInfo.componentCount == 3)
    {
        return false;
    }

    // We don't support formats which we can't represent without conversion
    if (d3d11FormatInfo.format().glInternalFormat != internalFormat)
    {
        return false;
    }

    // Buffer SRV creation for this format was not working on Windows 10.
    if (d3d11FormatInfo.texFormat == DXGI_FORMAT_B5G5R5A1_UNORM)
    {
        return false;
    }

    // This format is not supported as a buffer SRV.
    if (d3d11FormatInfo.texFormat == DXGI_FORMAT_A8_UNORM)
    {
        return false;
    }

    return true;
}

@Spiri0
Copy link
Contributor

Spiri0 commented Apr 9, 2025

@agargaro I can't assess to what extent that's an option for you. With WebGPU, you can use SBOs instead of DataTextures, and they don't have the limitation you mentioned. I use them extensively. If it has to be WebGL, I can't help, but if you want to use SBOs with three.webgpu.js, I can. I recently converted my ocean repo to SBOs. They make the code significantly leaner and much more efficient. With SBOs, you can read and write without any problems.
To what extent it makes sense to invest more effort in enabling WebGL to do something that is significantly easier to achieve with the new WebGPU technology is more a matter of personal preference.

@agargaro
Copy link
Contributor Author

agargaro commented Apr 9, 2025

Hi @Spiri0, thank you 😄

Currently I am still using WebGLRenderer but soon I will start to study WebGPURenderer as well, in order to migrate my InstancedMesh2 library.

But for those who are still using WebGLRenderer, I think the partial texture update can be very important, even if it is possible to implement it manually as I did here (I'm not an expert but maybe it can help someone).

If someone would like to provide some technical details, I could try to do a PR for only the WebGLRenderer if it makes sense.

Can I write to you privately, (maybe on the forum?) to ask you some information about what you were suggesting?

@Ctrlmonster
Copy link

Hey, I ran into a similar need. I've been working on an LOD System using BatchedMesh, where each BatchedMesh represents one LOD for many geometries and then a specific instanceId refers to the same geometry across LODs, e.g.

// enable lod0 on mesh 42 (assuming 3 lod levels)
batch0.setVisibleAt(42, true);
batch1.setVisibleAt(42, false); 
batch2.setVisibleAt(42, false);

But it turns out, that this actually hurts performance, compared to having no LOD at all, as the constant batchedMesh.setVisibleAt() calls and the following uploads of the batches _indirectTexture each frame cost a lot more time than the reduced triangle count saves. So if partial updates were possible that might be help usecases like these :)

@Spiri0
Copy link
Contributor

Spiri0 commented Apr 25, 2025

Use a large dst texture for the shader and a small src texture that only contains the data that needs to be updated in the dst. Then use copyTextureToTexture to transfer the data from the src to the dst. This works wonderfully and is very efficient for exactly what you have in mind. I speak from experience. This will update only the necessary parts of the dst texture, exactly as you want.

Image

Image

My LOD systems run at 120 fps. However, I don't use batched meshes for them because I don't consider them suitable for the intended purpose.

@RenaudRohlinger Do you remember that? You created this example at my request. It does exactly what this topic is about, and it works great.

"replace parts of an existing texture with all data of another texture"

https://threejs.org/examples/?q=part#webgpu_textures_partialupdate

This works the same with WebGL

@Ctrlmonster
Copy link

Hey @Spiri0, thanks for sharing your approach! Will definitely keep it in mind. The reason I'm hoping to use BatchedMesh for this, is that it solves the draw call problem much more effectively than InstancedMesh. With InstancedMesh the draw call count can still get into the hundreds (one per geometry) and since BatchedMesh already handles visibility itself, the only way around would be to fork/copy-paste the source.

@Spiri0
Copy link
Contributor

Spiri0 commented Apr 25, 2025

If your approach with batched meshes works well for you, then it's a good solution. To each his own.

The technique of loading data into a chunk dataTexture, updating it to send the data to the GPU, and then copy it to the target texture using copyTextureToTexture is exactly what you're looking for. I'm glad if it helps.

@RenaudRohlinger @Mugen87 The example of renauldRohlinger is essentially the answer to this topic. This works equally well in WebGL and WebGPU
Here again the link:
https://threejs.org/examples/?q=part#webgpu_textures_partialupdate

@CodyJasonBennett
Copy link
Contributor

CodyJasonBennett commented Apr 25, 2025

That doesn't work as well as you think it does. Think of how you commit memory to the GPU in the first place and then use it. Three.js forces a stall in the worst case since it has no scheduler to juggle WIP memory yet creates and uses memory in the same frame. This gets dangerous as memory throughput increases (and/or frequency in dynamic cases). Thereafter, a GPU <-> GPU copy is fast, but it's the worst case we are concerned about, which does not improve otherwise.

I already started work on partial updates to textures and applied to BatchedMesh in #30998. Generally, I am not a fan of the asymmetry between the future renderers and three core to skirt API discourse. I've been vocal about that for years, and this is an emergent example of the need to maintain the core API to support the rest of the project, modern renderers included. Of course, support will be mirrored if the API and approach are generally sound.

@Spiri0
Copy link
Contributor

Spiri0 commented Apr 25, 2025

That doesn't work as well as you think it does.

The projects I use it in work very well and are very complex projects. But if you have a more advanced technique, then I welcome it.

@CodyJasonBennett
Copy link
Contributor

If you think your projects could lend to future testing, it would be greatly appreciated if you could try there as work comes in. Otherwise, you should not expect regression here, but rather improvement in all renderers. For fully dynamic use cases like in games, unfortunately there is no perfect solution, but this is already a big step. I think a larger discussion is warranted if three.js wants to open up to inter-frame lifecycle for smooth streaming at the expense of single-frame determinism, but this would be a prerequisite.

@Spiri0
Copy link
Contributor

Spiri0 commented Apr 26, 2025

If you think your projects could lend to future testing, it would be greatly appreciated if you could try there as work comes in.

I'd be happy to test it when the time comes.
Here screenshots from my apps: (Mars resolution: 3 maps each with 131k) I could also use 500k or over 1M for each map if I had that. The key thing is that I can create quite detailed landscapes with Threejs, and with such efficiency that there's still plenty of room for shadows, atmosphere, and more.

For the ship, I upscaled the textures to 32k because I couldn't find anything with that high a resolution. Therefore, the texture still looks stair-stepped.

Partially updating textures works very well with copyTextureToTexture, but as I said, I'm always open to new ideas. And if you're working on something that's better, even if it already runs at 120 fps for me, I'll be happy to test and use it.

@gkjohnson
Copy link
Collaborator

I understand tht there are workarounds but my feeling is that this issue should be about the ergonomics of updating a portion of a texture (as you can already do with geometry data to a meaningful benefit) and how BatchedMesh can (and should) benefit from any performance gains for "free". Due to the architecture of three.js BatchedMesh cannot use copyTextureToTexture internally to update a subtexture frame. As far as I understand WebGPU may make this more simple to do but I think it's to be seen whether WebGPURenderer will be able to afford these kinds of benefits until webgl is considered more defunct.

I know @agargaro has done a lot of work and testing in this area, already, and it would be great if we can get some more concrete use cases and numbers relating to the benefits of a feature like this. I know it can be hard to these kinds of upload times but a small demo with showing framerate differences when updating one pixel in the matrix / color texture with and without partial texture updates would be great.

@CodyJasonBennett
Copy link
Contributor

I already have work for this in #30998 and applied to BatchedMesh. Some prior art would also be spite/THREE.UpdatableTexture, but note I am using WebGL2 features. I've left it as a draft until I figure out a good way to leverage it for BatchedMesh's indirection texture. The slowdown reported there when methods like setVisibleAt prompted me to investigate, as I know the indirection table is also affected by this sorting, but color and matrices textures are not. It is not enough to only address color and matrices textures.

It would be worthwhile if we could arrive at a solution that benefits @Ctrlmonster's case, where they want to use setVisibleAt for a level, perhaps without built-in sorting or using app-specific clustering between multiple BatchedMesh. This should be directly comparable to using methods like setMatrixAt and setColorAt, which may make for a more playful example. Noting some recent regressions, I think it's important a feature includes an example for quality control purposes. I can, of course, contribute one.

Regarding the API, it would have to be quite different to support a third level for 3D textures, but I struggle to see how this is useful or desirable in practice. When are you ever going to update a LUT from host memory? It should be compressed anyway, or better to use a fitted function for mobile. I have mirrored the API from BufferAttribute and implemented it for 2D textures only.

@Ctrlmonster
Copy link

Hey, just wanted to share that I've also done some more tests with a lot of help from @agargaro. We were wondering why the frametime takes so long, even though the texture was getting updated each frame anyway (regardless of setVisibleAt) due to sorting and per-instance frustum culling being enabled.

So my preliminary conclusion (take this with a grain of salt) was that on low-end devices (I'm testing on a Thinkpad T480s) the texture uploads due to visibility changes really start adding up, once you add multiple BatchedMeshes to the scene (around 15-20 in my case). My LOD case should probably be re-organized so that each LOD geometry gets stored within a single BatchedMesh, but there are other valid reasons for having multiple BatchedMeshes, like having many different texture sets and using one BatchedMesh for each.

I'm a bit short on time right now, but maybe I can make a demo for this in the coming days, as getting some harder numbers to look at would certainly be helpful. I think @agargaro has already seen performance gains by using their SquareDataTexture, mentioned in the OP, for matrix and color updates. That works similar to @CodyJasonBennett PR, if I'm not mistaken.

For setVisibleAt the performance gains might not be as free – implementation wise – but achieving similar gains there would be very worthwhile, given that in a lot of apps the visibility inside a BatchedMesh will change each frame and result in a texture upload.

@gkjohnson
Copy link
Collaborator

@CodyJasonBennett

I've left it as a draft until I figure out a good way to leverage it for BatchedMesh's indirection texture. The slowdown reported there when methods like setVisibleAt prompted me to investigate, as I know the indirection table is also affected by this sorting, but color and matrices textures are not. It is not enough to only address color and matrices textures.

I think it's fair to say that even just addressing color and matrix textures should be a meaningful improvement. I don't necessarily want to block on what would be a good API and performance boost because it's not perfect. That said it would also be good to have some concrete numbers using something like framerate on the improvements in basic cases (eg no sorting) to see how this plays out. I'm curious to how the post-processing range "compression" impacts update time per frame in #30998 compared to the data upload-time benefits.

But here are some of the different problems that are clearly in-optimal and I think can be tackled separately:

  1. Matrix and color textures are unnecessarily uploaded even when just a single pixel is changed.
  2. The "indirect" texture (to support sorting, visibility toggle, culling) is fully reuploaded even when it my have only changed minimally, not at all.
  3. Matrix and indirect texture are created, uploaded even when not needed possibly incurring some meaningful performance impact (related to WEBGPU: make batchMesh._matricesTexture optional #30990).

#1 should be addressed with #30998. And at least a basic optimization for #2 would be to just upload the pixels associated with instances that will actually be drawn, which should be sequential. In non toy cases this could be significant depending on how many objects are frustum culled.

@Ctrlmonster

as getting some harder numbers to look at would certainly be helpful

This would be great.

My LOD case should probably be re-organized so that each LOD geometry gets stored within a single BatchedMesh, but there are other valid reasons for having multiple BatchedMeshes, like having many different texture sets and using one BatchedMesh for each.

You can easily swap geometry models an instance is rendering why by using the setGeometryIdAt function. Regarding separate materials and textures - this should hopefully not be considered a long term issue. @agargaro and I have done a number of multi-material demos with BatchedMesh (see here), which should only get easier to work with with node materials.

@CodyJasonBennett
Copy link
Contributor

I split away the changes to BatchedMesh from supporting partial updates in Texture with #30998. It's now up for review.

That should allow us to continue with no. 1 and no. 2 as we like. I have had help from @agargaro, and they expressed interest in continuing this work.

The review in #30998 (comment) might open the door to supporting 3D textures. I want to be careful here since this API already exists in BufferAttribute, but an overload is possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants