Does Swift support multimodal interleave data in training GRPO?

Thanks for your work.
If the training data like:
```
{
      "images": [
          "f00ff96c-4eef-4eec-8d08-ed92ce4fc6af.webp",
          "2a694c18-8393-4247-87a3-402e8e278199.webp",
          "111375e84a4fda29e2eed759d4ce9b.jpg"
      ],
      "messages": [
          {
              "role": "user",
              "content": "The first image: The second image: The third image: Is there a clear reflection visible in the water in the first image? Output your thought process within the <think> </think> tags, then provide your final answer with the <answer> </answer> tags."
          }
      ],
      "solution": "<answer>Yes</answer>"
  }
```

How to arrange interleave image with text when training GRPO? Here I can't find some tag like `<image>` to mark an image in GRPO.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does Swift support multimodal interleave data in training GRPO? #4905

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Does Swift support multimodal interleave data in training GRPO? #4905

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions