hil-serl performs well during training, but performs poorly when using the eval_policy.py

### System Info

```Shell
- `lerobot` version: 0.3.3
- Platform: Linux-6.8.0-83-generic-x86_64-with-glibc2.35
- Python version: 3.11.13
- Huggingface_hub version: 0.35.0
- Dataset version: 3.6.0
- Numpy version: 2.2.6
- PyTorch version (GPU?): 2.10.0.dev20250918+cu128 (True)
- Cuda version: 12080
- Using GPU in script?: <fill in>
```

### Information

- [ ] One of the scripts in the examples/ folder of LeRobot
- [x] My own task or dataset (give details below)

### Reproduction

pick rate is almost 100% when traing
https://github.com/user-attachments/assets/0e84db06-aa08-4d93-a051-531fcb36afb9

but when eval（eval_policy.py）,pick rate is 0
https://github.com/user-attachments/assets/342871fd-5470-4a81-b2f6-0a5c83427fcd



### Expected behavior

Recurrence solution

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

hil-serl performs well during training, but performs poorly when using the eval_policy.py #2051

System Info

Information

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

hil-serl performs well during training, but performs poorly when using the eval_policy.py #2051

Description

System Info

Information

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions