LoRa loading is extremely inefficient due to repeated datatype queries

### Describe the bug

When applying a LoRA state dict that is already loaded into the memory, the `load_lora_weights()` + `unload_lora_weights()` cycle takes ~5.5 seconds which a majority of the time is spent on repeated dtype queries. 

### Reproduction

```py
import time
import torch
import safetensors.torch
from diffusers import StableDiffusionXLPipeline

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
    local_files_only=True,
)

pipe = pipe.to("cuda")

# !wget https://civitai.com/api/download/models/135931 -O loras/pixel-art-xl.safetensors
lora_weights = safetensors.torch.load_file(
    "loras/pixel-art-xl.safetensors", device="cpu"
)

for _ in range(5):
    t0 = time.perf_counter()
    pipe.load_lora_weights(lora_weights.copy())
    pipe.unload_lora_weights()
    print("Load + unload cycle took: ", time.perf_counter() - t0)
```

### Logs

```shell
Load + unload cycle took:  5.548293198924512
Load + unload cycle took:  6.468372649978846
Load + unload cycle took:  6.315054736100137
Load + unload cycle took:  5.443292624084279
Load + unload cycle took:  6.357059679925442
```
```


### System Info

- `diffusers` version: 0.21.0.dev0
- Platform: Linux-5.15.0-71-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- Huggingface_hub version: 0.16.4
- Transformers version: 4.33.1
- Accelerate version: 0.22.0
- xFormers version: 0.0.21
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no

### Who can help?

@williamberman, @patrickvonplaten, and @sayakpaul

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LoRa loading is extremely inefficient due to repeated datatype queries #4975

Describe the bug

Reproduction

Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LoRa loading is extremely inefficient due to repeated datatype queries #4975

Description

Describe the bug

Reproduction

Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions