-
Notifications
You must be signed in to change notification settings - Fork 51
Open
Description
Hi, I reimplemented this code to train. However, Fitdit lora just worked with Noise training when I add Frequency loss restoration in predicted image was a problem
noise_pred = net(
hidden_states=torch.cat([noisy_model_input, vton_model_input, batch["mask_input"].to(dtype=weight_dtype)], dim=1),
timesteps=timesteps,
pooled_projections=cloth_image_embeds,
pose_image=batch["pose_image"].to(dtype=weight_dtype),
ref_key=ref_key,
ref_value=ref_value,
encoder_hidden_states=None,
return_dict=False,
)
# mask torch.Size([1, 1, 128, 96])
x0_pred = (noisy_model_input - sigmas * noise_pred) # / (1.0 - sigmas)
x0_pred = (x0_pred / vae.config.scaling_factor) + vae.config.shift_factor
pixel_x0_pred = vae.decode(x0_pred, return_dict=False)[0]
I believed SD3 was trained in flow matching so I modified a little. However, predicted images was blurred so they affected the whole training process. Any advice or help for me

Metadata
Metadata
Assignees
Labels
No labels