Closed
Description
Sage Attention provide low-bit quantization of attention. https://github.com/thu-ml/SageAttention
Seems like SageAttention supports plug-and-play way.. Will Diffusers plan to support sage attention by options?
import torch.nn.functional as F
+ from sageattention import sageattn
+ F.scaled_dot_product_attention = sageattn