Paged attention does not work

### System Info

- `transformers` version: 4.57.0
- Platform: Linux-5.15.0-107-generic-x86_64-with-glibc2.35
- Python version: 3.12.11
- Huggingface_hub version: 0.35.3
- Safetensors version: 0.6.2
- Accelerate version: 1.10.1
- Accelerate config:    - compute_environment: LOCAL_MACHINE
        - distributed_type: MULTI_GPU
        - mixed_precision: bf16
        - use_cpu: False
        - debug: False
        - num_processes: 8
        - machine_rank: 0
        - num_machines: 1
        - gpu_ids: all
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - enable_cpu_affinity: False
        - downcast_bf16: no
        - tpu_use_cluster: False
        - tpu_use_sudo: False
        - tpu_env: []
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.8.0+cu128 (CUDA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: no
- Using GPU in script?: yes
- GPU type: NVIDIA H20


### Who can help?

@vasqu @ArthurZucker @CyrilVallez

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

When set attn_implementation to "paged_attention", I meet the error even with very simple inference code.
This error is similar to #39525

```python
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import pandas as pd
import torch
from transformers import AutoTokenizer,Qwen3ForCausalLM,AutoConfig,AutoModelForCausalLM

if __name__ == '__main__':
    model_path="/share/models/Qwen3-4B-Thinking-2507"

    tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True, add_bos_token=False,
                                              add_eos_token=False)

    model = AutoModelForCausalLM.from_pretrained(model_path,
                                             dtype=torch.bfloat16,
                                             trust_remote_code=True,
                                             #config=config,
                                             attn_implementation="paged_attention",#"flash_attention_2",#"sdpa",#
                                             device_map="cuda"  
                                             ).eval()

    prompt="How are you today?"
    chat_prompt=tokenizer.apply_chat_template([{"role":"user","content":prompt}],tokenize=False,add_generation_prompt=True)

    chat_prompt_ids=tokenizer(chat_prompt,return_tensors="pt")["input_ids"].to(model.device)
    output=model.generate(input_ids=chat_prompt_ids,max_new_tokens=500,num_beams=1,do_sample=False,temperature=1.0,use_cache=True,return_dict_in_generate=True,output_logits=True)
    output_text=tokenizer.decode(output['sequences'][0][chat_prompt_ids.size(1):])
    print(output_text)

```

Error:
```
  File "/root/miniconda3/envs/yyj/lib/python3.12/site-packages/transformers/models/qwen3/modeling_qwen3.py", line 260, in forward
    hidden_states, _ = self.self_attn(
                       ^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/yyj/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/yyj/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/yyj/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/yyj/lib/python3.12/site-packages/transformers/models/qwen3/modeling_qwen3.py", line 216, in forward
    attn_output, attn_weights = attention_interface(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/yyj/lib/python3.12/site-packages/transformers/integrations/flash_paged.py", line 85, in paged_attention_forward
    cu_seq_lens_q.to(torch.int32),
    ^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'to'
```

### Expected behavior

fix the bug

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Paged attention does not work #41501

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Paged attention does not work #41501

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions