Camera Driven Rendering #19704

tychedelia · 2025-06-17T20:15:57Z

tychedelia
Jun 17, 2025
Collaborator

Bevy's rendering APIs have been described as "camera driven" several times, but it's not always clear exactly what that means. In Bevy's rendering system, the "camera" entity has a privileged position and provides the following behaviors/data:

A camera view matrix, i.e. the actual "camera" part of the camera, including the camera viewport
The camera hdr-ness, which controls the texture format of the "internal" texture
The render target, which describes where the final output will be composited
The render graph entities visible within the camera should run
Configuration for full-screen passes, including effects, anti-aliasing, etc.
Implicitly, the rendering texture that is used for all passes of the render graph (ViewTarget)
What entities are visible via RenderLayers

Mixed metaphors

Some discussion in #16248 brings up a number of metaphors for what a camera is:

aevyrie proposes that a camera is really a "render surface", and that it's behavior and configuration is more about the capture medium, i.e. the "film".
cart argues that RenderTarget is the film, although notes the conceptual imprecision due to the hdr field which hints at the presence of the internal texture, and further introduces the idea that the camera is split into two parts, the camera and the lens.

UI presents some other challenges to the metaphor. While UI does have an implicit orthographic projection and is superficially similar to 2d rendering in many ways, it raises questions as to what a UI camera is "looking at" and being rendered on.

Currently, a UI camera is a virtual view ("subview") that is tied to a 2d/3d camera. In this sense, if the lens determines "how" the scene looks, the UI camera is like an additional filter or color gel that is place in front of the camera, i.e. not really a camera at all.

In 15256 when discussing world-space UI, aevyrie argues that the camera metaphor obscures possible implementation, where it could make sense to parent a render surface to some other entity already in worldspace and have things "just work."

Other proposals, such as a hypothetical CameraFullscreen, which would be a way to run a render graph with no geometry, i.e. a simple way for users to write fullscreen shaders, continue to stretch the metaphor.

The problem with compositing

I'd like to argue that the idea of camera driven rendering is fundamentally sound, but suffers from a critical conceptual ambiguity with respect to what the film medium is. More precisely, the fact that the camera both captures and composites is a significant problem for the API, particularly when using multi-camera setups.

The hidden "internal texture"

Importantly, RenderTarget is not the film, it is something more like the print the film is developed onto. CameraOutputMode is the developer/fixer. The film is, in our current API, not directly exposed to the user.

Every ExtractedView has a ViewTarget, which contains two logical textures: the "main" texture, which is used as the color attachment for most render passes, and the "out" texture, which is the RenderTarget typically a swapchain texture. Importantly, the out texture is only used in the final step of the render graph, where the upscaling node blits (i.e. composites) the main texture to the out texture. In other words, the user never sees the main texture itself, which is why it can be said to be "internal."

Jasmine notes in #16248 this is particularly confusing because, for example, the hdr field on Camera actually has nothing to do with the RenderTarget. This has also lead to the proliferation of some more niche components like CameraMainTextureUsages that allow configuring the internal texture for other uses in the render graph.

The sharp edges of multi-cam

Users consistently run into issues with using multiple cameras. When two cameras share the same HDR and MSAA settings, the renderer will "helpfully" re-use the same cached texture for both of them, including disabling clearing the texture for all cameras after the first. This is potentially a performance win and in many cases results in the behavior users expect, where one camera can easily draw on top of another camera's output, but has a number of unfortunate consequences:

Changing HDR or MSAA settings on a camera can suddenly lead to unexpected results, as cameras are no longer rendering to the same internal texture, i.e. the second camera will now clear.
Post-processing effects are not really post-processing. For example two cameras that render to a viewport but both configure tonemapping will end up double tonemapped as the post-processing effects runs twice over the cached internal texture.
When not rendering into a viewport and using a default camera output mode, only the last upscaling pass actually matters, as it will completely write over the previous cameras output.
A complicated MSAA writeback pass is required before each camera to ensure that the final MSAA texture from a prior camera is resolved before a new camera starts, causing performance issues.

Additionally, this texture is generally not configurable, which poses issues for more niche uses that require different texture formats or would like to use the texture in other contexts.

Proposal: Camera Graph

My proposal is that we embrace camera driven rendering by understanding compositing as another kind of camera. More specifically, I want to argue that we should understand cameras as forming a kind of graph that has both inputs and outputs.

Another way to put this is that a camera should be considered as a logical render pass . This is the CameraSubGraph component / the "lens" of the camera. Rather than imagining that the user should configure a single monolithic render graph that accomplishes all their needs in a single camera, we should be encouraging users to create multiple cameras.

Making the relationship between cameras itself a graph can help define how textures (and potentially other resources) should flow through rendering at a more coarse grained level and makes creative decisions with respect to compositing explicit. Users who want fine grained control for maximum performance and resource efficiency can still configure a single camera/render graph.

By having cameras accept texture inputs and making compositing a separate step, we can drastically simplify the conceptual model: camera's have film, they can also accept film from another camera to do double-exposure. By making the actual render texture explicit, I think it will be easier to teach patterns for multi-camera rendering. And, while configuring multiple cameras may be a bit of a pain today, this kind of pattern is well suited for asset driven configuration (BSN) and editor tooling.

API Sketch

This isn't intended as a concrete proposal but just a sketch of what an API might look like:

fn setup(  
    mut commands: Commands,  
    mut images: ResMut<Assets<Image>>,  
    mut materials: ResMut<Assets<StandardMaterial>>,  
    mut meshes: ResMut<Assets<Mesh>>,  
) {  
    // Textures  
    let offscreen_texture_handle = images.add(Image::new_fill(  
        Extent3d { width: 512, height: 512, depth_or_array_layers: 1 },  
        TextureDimension::D2,  
        &[0, 0, 0, 0],  
        TextureFormat::Rgba8UnormSrgb,  
        RenderAssetUsages::default(),  
    ));  
    let offscreen_texture = commands.spawn(RenderTexture::Image {  
        handle: offscreen_texture_handle.clone(),  
        clear_color: ClearColorConfig::Custom(Color::BLACK),  
    }).id();  
  
    let main_render_texture = commands.spawn(RenderTexture::Image {  
        handle: images.add(Image::new_fill(  
            Extent3d { width: 1920, height: 1080, depth_or_array_layers: 1 },  
            TextureDimension::D2,  
            &[0, 0, 0, 0],  
            TextureFormat::Rgba16Float,  
            RenderAssetUsages::default(),  
        )),  
        clear_color: ClearColorConfig::Custom(Color::BLACK),  
    }).id();  
  
    let ui_texture = commands.spawn(RenderTexture::Image {  
        handle: images.add(Image::new_fill(  
            Extent3d { width: 1920, height: 1080, depth_or_array_layers: 1 },  
            TextureDimension::D2,  
            &[0, 0, 0, 0],  
            TextureFormat::Rgba8UnormSrgb,  
            RenderAssetUsages::default(),  
        )),  
        clear_color: ClearColorConfig::None,  
    }).id();  
  
    let pip_texture = commands.spawn(RenderTexture::Image {  
        handle: images.add(Image::new_fill(  
            Extent3d { width: 320, height: 240, depth_or_array_layers: 1 },  
            TextureDimension::D2,  
            &[0, 0, 0, 0],  
            TextureFormat::Rgba8UnormSrgb,  
            RenderAssetUsages::default(),  
        )),  
        clear_color: ClearColorConfig::Custom(Color::BLACK),  
    }).id();  
  
    let swapchain = commands.spawn(RenderTexture::Window(WindowRef::Primary)).id();  
  
    // Cameras  
    let offscreen_camera = commands.spawn((  
        CameraFullscreen,  
        FullscreenShader("noise.wgsl"),  
        RendersTo(offscreen_texture),  
    )).id();  
  
    let main_camera = commands.spawn((  
        Camera3d::default(),  
        Transform::from_xyz(0.0, 5.0, 10.0).looking_at(Vec3::ZERO, Vec3::Y),  
        RendersTo(main_render_texture),  
    ))  
        .with_related::<CameraInput>(ReadsFrom::Implicit(offscreen_camera))  
        .id();  
  
    let effects_camera = commands.spawn((  
        Camera3d::default(),  
        Tonemapping::AcesFitted,  
        Bloom::NATURAL,  
        RendersTo(main_render_texture), // Writes back to same texture  
        RenderLayers::layer(1),  
    ))  
        .with_related::<CameraInput>(ReadsFrom::Camera(main_camera));  
  
    let ui_camera = commands.spawn((  
        CameraUi,  
        RendersTo(ui_texture),  
    ))  
        .with_related::<CameraInput>(ReadsFrom::Camera(effects_camera))  
        .id();  
  
    let pip_camera = commands.spawn((  
        Camera3d::default(),  
        Transform::from_xyz(10.0, 2.0, 0.0).looking_at(Vec3::ZERO, Vec3::Y),  
        RendersTo(pip_texture),  
    )).id();  
  
    commands.spawn((  
        CameraCompositing,  
        RendersTo(swapchain),  
    ))  
        .with_related::<CameraInput>(ReadsFrom::Camera(ui_camera))  
        .with_related::<CameraInput>(ReadsFrom::Camera(pip_camera));  
  
    // Setup scene  
  
    let material_handle = materials.add(StandardMaterial {  
        base_color_texture: Some(offscreen_texture_handle.clone()),  
        ..default()  
    });  
  
    commands.spawn((  
        Mesh3d(meshes.add(Cuboid::new(4.0, 4.0, 4.0))),  
        MeshMaterial3d(material_handle),  
        Transform::from_xyz(0.0, 0.0, 1.5).with_rotation(Quat::from_rotation_x(-PI / 5.0)),  
    ));  
}

As a logical graph:

Offscreen Camera ───► Offscreen Texture
      │
      │ (implicit dep)
      ▼
Main Camera ───► Main Texture ◄─── Effects Camera
                       │
                       └───► UI Camera ───► UI Texture
                                                 │
PiP Camera ───► PiP Texture                      │
                      │                          │
                      └────────┐    ┌────────────┘
                               ▼    ▼
                          Compositing Camera ───► Swapchain

Drawbacks

Additional cameras incur additional overhead. While this is something we should work to eliminate, the single render graph will always be faster.
Handling compositing internal to a camera makes it easy to spawn a single camera that just works. I think that we can probably figure out some pattern here to configure things correctly by default (i.e. there must be at least one CompositingCamera in the scene, by default a camera's output goes to that input), but it makes the default case a bit more complicated.
Dealing with many cameras in a graph can be difficult in code, this kind of use case often demands dedicated UI tooling.
The metaphor does start to become strained. For example, one of the first things I'd like to do is have a camera that configures a compute pass in a CameraSubGraph and passes storage buffers into the next camera, i.e. making cameras also accept buffers as inputs/outputs.
This kind of pattern can be done in user space already. I've been working on an application that does so for a while and this post is partially informed by my realization that this multi-camera pattern isn't an abuse of the camera system but maybe it's ideal realization.

aevyrie · 2025-06-17T20:22:31Z

aevyrie
Jun 17, 2025
Collaborator

Anything that makes compositing more intentional and well thought out has my full support. It interacts with a lot of areas (rendering, ui, picking), and many people don't understand how it works, how powerful it is, or that key features like picking support it.

Wearing my picking hat, the thing that we will need to maintain with these changes, is the ability for a pointer to be on some rectangular composited surface, and that the layers composited onto this surface are explicitly ordered and can be hooked into with picking backends. This is what makes it possible today for you to composite cameras on a surface, and to have a single pointer over multiple cameras, with their order being respected so that WYSIWYG for picking.

1 reply

tychedelia Jun 17, 2025
Collaborator Author

Yup, makes sense. I hadn't thought too deeply about picking but we want to preserve its behavior and make sure that we don't make anything too complicated for users.

NthTensor · 2025-06-17T20:31:05Z

NthTensor
Jun 17, 2025
Collaborator

I really like this. I think anything graph-structured like this really begs for editor tooling, so it's great to see this all being driven by the ecs state.

0 replies

cart · 2025-06-17T20:42:54Z

cart
Jun 17, 2025
Maintainer

I see the motivation here from a "graphics internals" developer perspective. The current camera system was built to be a high-ish level API that "just works" for developers, even those who don't have graphics programming experience (obviously "just works" is a matter of perspective and scenario). Before committing to a path like this I'd like to see a proposal for how we will make this approachable and ergonomic for non-graphics-devs for the "high traffic" scenarios like:

Single camera scenarios working by default when spawned in from something like a GLTF scene
Single camera scenarios working by default when spawned in code
Split screen rendering (ex: local multiplayer)
Rendering to a texture and displaying the rendered texture in a quad rendered by the main camera.
Enabling and disabling a camera to switch perspectives (ex: switching to a "cutscene camera").

As in: what will the user-facing code look like (and how will it behave under the hood).

Needing to manually unwire and rewire cameras from a compositor feels very "mid level API" to me, and is not what most developers will expect coming from other engines.

These problems seem surmountable. But I'd want solid high level UX sorted out (as in ... competitive with the current impl) prior to committing to this path.

2 replies

alice-i-cecile Jun 17, 2025
Maintainer

My feedback echoes this: this seems well thought out from a "principled rendering architecture perspective", and I trust your judgement / our other rendering experts there. The notion of camera graphs seems very learnable, as long as we can get the docs right.

The primary requests I have are a) clear, high-level vision and docs (in module docs, not just release notes ;) ) of how this all works. b) clear usage examples that demonstrate how to do common tasks for non-rendering engineers.

I'll also note that as one of the main proponents for implict UI cameras, I think that was ultimately a mistake. It makes the easy cases easier, but makes the more complex cases much harder and interferes with building a solid mental model.

tychedelia Jun 17, 2025
Collaborator Author

I'll readily admit that part of my motivation here is to enable some of the use cases I want that are more niche. But a lot of this also comes from my experience interacting with users who are attempting to use multiple cameras, so I'd like to question the degree to which our existing API does actually "just work" for multi-camera setups (which I agree should be the goal for the 99% of predictable use cases in gamedev).

Single camera scenarios working by default when spawned in from something like a GLTF scene

Single camera scenarios working by default when spawned in code

This is the biggest one and I'd like to preserve our existing API here. Spawning a single camera that writes to swap chain should require no new concepts.

Split screen rendering (ex: local multiplayer)

This is one of the use cases that users currently struggle with. We've had to fix a number of bugs to make this possible (see the history on the split screen example) and the fact that the camera renders to a secret internal texture has the potential to be really confusing if the per camera hdr or msaa settings are changed or the user tries to use full screen effects. How things get composited back together is not obvious at all to users.

I think there are probably other better higher level APIs we could develop in conjunction with these proposed changes that would treat the cameras as a logical pair/group and help configure everything for you correctly.

Rendering to a texture and displaying the rendered texture in a quad rendered by the main camera.

Off screen rendering is specifically a use case I would love if was better understood by our users. I think there are a lot of use cases where users could benefit from "lazily" activating a camera where needed to regenerate a texture that's used elsewhere. One example I've used is creating icons for in-game content (e.g. where the user can create items, etc).

Separately from this, I have goals to add a FullscreenCamera to make it easier to run a shader to generate a texture.

I think here, users can start to feel like this is maybe a hack, in part due to the camera metaphor but also due to needing to generate lots of per-camera render layers to make this kind of pattern possible.

Enabling and disabling a camera to switch perspectives (ex: switching to a "cutscene camera").

I'm not sure this would be made any more complicated assuming we could solve for "default" initialization (i.e. where all cameras (ideally) share a single texture). This would be a great usage based example, as I'm not sure users would have a great intuitive sense of how to do this in our current system.

As in: what will the user-facing code look like (and how will it behave under the hood).

Ideally, no changes, except for users who are already doing complicated things. Part of my motivation here was the realization that this pattern is basically already possible.

Regardless of the high level UX discussion here, I am going to open PRs to allow the users explicitly configuring and providing the internal texture as something that can be referenced via a Handle<Image> as part of our goals to make more render internal textures available in the main world (think depth debugging, etc.).

Needing to manually unwire and rewire cameras from a compositor feels very "mid level API" to me, and is not what most developers will expect coming from other engines.

I agree this is mid level but I think our existing strategy where we try to composite on a best effort basis in a way that is totally implicit isn't great for users either, as going off the happy path doesn't put users at a mid-level API but throws them into the deep end of renderer internals.

ecoskey · 2025-06-17T21:36:27Z

ecoskey
Jun 17, 2025
Collaborator

I really like the direction here. I'm probably down to go this way instead of with my draft, though I think there's still some good ideas there we can adapt. Bunch of assorted thoughts incoming:

I think we should separate the physical metaphor parts of Camera (projection, visibility, transform) from the composition/ordering/image-based parts. This should help with the rendering crate reorg (I think it's necessary for that), but would also make reasoning about things like UI (not really a camera) easier.

IIUC, I agree doing ordering/composition authoritatively from the top down is the way to go. In my draft that takes the form of a tree, but a graph works too. Besides the tooling benefits, that should let us simplify a lot of the implicit ordering stuff we have on cameras now, like the atomics on ViewTarget.

cameras have film, they can also accept film from another camera to do double-exposure.

Speaking of ViewTarget, I think ^this is going to be a really important (and maybe hard) part to get right. In particular, how do we let cameras confidently configure their own internal textures while also giving room for someone outside to override them? What if a camera doesn't even use the standard internal textures?

Regarding encouraging a shift from render graphs to camera graphs, it'd be good to figure out how we want users to interact with each/what we want them to be able to do. Things like: should we constrain what an individual camera graph can/should do so that we can improve their ergonomics? Does the ability for any plugin to mess with/add to engine render graphs still matter as much if they can edit the camera graph instead?

0 replies

viridia · 2025-06-17T22:26:50Z

viridia
Jun 17, 2025
Collaborator

I'm not much of a render pipeline guy, but I would like to speak from the perspective of the needs for compositing in UI, and hopefully this won't be too tangential. This is coming from someone who has both long experience on the web, and also who has actually written a browser.

A subtle yet critical feature of CSS which many people aren't aware of is "implicit compositing". CSS normally composites elements onto a single surface, but will create additional surfaces when it needs to, based on styles. If an element has certain style properties, such as transforms (scaling and rotation), or post-processing effects (such as blur), CSS brings into play a more complex rendering scheme in which the element and its children are first rendered onto a separate surface and then composited onto the parent's surface. This all happens in a way which is transparent to the webdev, although there are various known recipes that can intentionally force this behavior.

A very simple use case for this is animating opacity: if you have a dialog or popup (such as a character inventory screen or settings mode) you may want this to "fade in". But individually settings the opacity of the popup's root entity and each child entity gives you the wrong answer, and looks rather ugly. Instead, what you want in this case is to opaquely composite the popup and all it's children, and then animate the opacity of the composited result.

Now, I don't think we need to have this sort of thing completely automatic and invisible as it is in CSS, but it would be nice if it were easy to do - perhaps by inserting the right components at some point in the entity hierarchy. Ideally, it should be little effort for the developer to say "this sub-tree of the UI is composited onto a buffer", and that includes having picking work as expected.

One challenge with this is Bevy doesn't know how large of a buffer we'll actually need; but I'm OK with the developer having to supply this information as a hint up front, since (for all the use cases I can think of) the value is quite predictable.

Uh oh!

Camera Driven Rendering #19704

Uh oh!

tychedelia Jun 17, 2025 Collaborator

Mixed metaphors

The problem with compositing

The hidden "internal texture"

The sharp edges of multi-cam

Proposal: Camera Graph

API Sketch

Drawbacks

Replies: 5 comments · 3 replies

Uh oh!

Uh oh!

aevyrie Jun 17, 2025 Collaborator

Uh oh!

tychedelia Jun 17, 2025 Collaborator Author

Uh oh!

NthTensor Jun 17, 2025 Collaborator

Uh oh!

Uh oh!

cart Jun 17, 2025 Maintainer

Uh oh!

alice-i-cecile Jun 17, 2025 Maintainer

Uh oh!

tychedelia Jun 17, 2025 Collaborator Author

Uh oh!

Uh oh!

ecoskey Jun 17, 2025 Collaborator

Uh oh!

viridia Jun 17, 2025 Collaborator

tychedelia
Jun 17, 2025
Collaborator

Replies: 5 comments 3 replies

aevyrie
Jun 17, 2025
Collaborator

tychedelia Jun 17, 2025
Collaborator Author

NthTensor
Jun 17, 2025
Collaborator

cart
Jun 17, 2025
Maintainer

alice-i-cecile Jun 17, 2025
Maintainer

tychedelia Jun 17, 2025
Collaborator Author

ecoskey
Jun 17, 2025
Collaborator

viridia
Jun 17, 2025
Collaborator