Skip to content

Question about control sd3 #11867

Open
Open
@Henry-Bi

Description

@Henry-Bi

I've noticed a potential inconsistency in how the VAE-encoded control_image is processed between the training script for ControlNet with Stable Diffusion 3 and the corresponding inference pipeline.

In the inference pipeline (pipeline_stable_diffusion_3_controlnet.py):

The control_image latent is processed by both subtracting the vae_shift_factor and multiplying by the scaling_factor.

control_image = (control_image - vae_shift_factor) * self.vae.config.scaling_factor
In the training script (train_controlnet_sd3.py):

However, in the provided training example, the VAE-encoded controlnet_image is only multiplied by the scaling_factor, without subtracting the shift_factor.

controlnet_image = controlnet_image * vae.config.scaling_factor
Shouldn't the training script also apply the vae_shift_factor to maintain consistency with the inference process? It seems the correct implementation in the training script should be: controlnet_image = (controlnet_image - vae.config.shift_factor) * vae.config.scaling_factor

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions